perm filename MF.WEB[MF,DEK]1 blob
sn#737952 filedate 1984-01-19 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00021 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 % This program is copyright (C) 1984 by D. E. Knuth all rights are reserved.
C00007 00003 @* \[1] Introduction.
C00040 00004 @* \[2] The character set.
C00054 00005 @* \[3] Input and output.
C00078 00006 @* \[4] String handling.
C00096 00007 @* \[5] On-line and off-line printing.
C00115 00008 @* \[6] Reporting errors.
C00140 00009 @* \[7] Arithmetic with scaled numbers.
C00158 00010 @* \[8] Algebraic and transcendental functions.
C00180 00011 @* \[9] Packed data.
C00197 00012 @* \[10] Dynamic memory allocation.
C00216 00013 @* \[11] Data structures for paths.
C00238 00014 @* \[12] Memory layout.
C00251 00015 @* \[13] Choosing control points.
C00298 00016 @* \[xx] File names.
C00329 00017 @* \[xx] Font metric data.
C00385 00018 @* \[xx] Device-independent file format.
C00425 00019 @* \[52] Debugging.
C00430 00020 @* \[54] System-dependent changes.
C00431 00021 @* \[55] Index.
C00433 ENDMK
C⊗;
% This program is copyright (C) 1984 by D. E. Knuth; all rights are reserved.
% Copying of this file is authorized only if (1) you are D. E. Knuth, or if
% (2) you make absolutely no changes to your copy. (The WEB system provides
% for alterations via an auxiliary file; the master file should stay intact.)
% In other words, METAFONT is under essentially the same ground rules as TeX.
% Version -1 is in preparation.
% A reward of $-5.12 will be paid to the first finder of any remaining bug.
% Places where I say "λλλ" must be fixed before I publish this!
% Here is TeX material that gets inserted after \input webmac
\def\hang{\hangindent 3em\noindent\ignorespaces}
\def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces}
\font\ninerm=amr9
\let\mc=\ninerm % medium caps for names like PASCAL
\def\PASCAL{{\mc PASCAL}}
\def\ph{{\mc PASCAL-H}}
\font\logo=manfnt % font used for the METAFONT logo
\def\MF{{\logo META}\-{\logo FONT}}
\def\<#1>{$\langle#1\rangle$}
\def\section{\mathhexbox278}
\def\(#1){} % this is used to make section names sort themselves better
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>
\outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section
\def\rhead{PART #2:\uppercase{#3}} % define running headline
\message{*\modno} % progress report
\edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next
\ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces}
\let\?=\relax % we want to be able to \write a \?
\def\title{{\logo opqrstuq}}
\def\topofcontents{\hsize 5.5in
\centerline{\logo ()*+,-.*\titlefont\ (more or less)}
\vfill
\def\?##1]{\hbox to 1in{\hfil##1.\ }}
}
\pageno=3
@* \[1] Introduction.
\def\glob{13} % this should be the section number of "<Global...>"
\def\gglob{20, 26} % this should be the next two sections of "<Global...>"
This is \MF, a font compiler intended to produce high-quality typefaces.
The \PASCAL\ program that follows is the definition of \MF84, a standard
@:PASCAL}{\PASCAL@>
@!@:METAFONT84}{\MF84@>
version of \MF\ that is designed to be highly portable so that identical output
will be obtainable on a great variety of different computers. Its conventions
are the same as those of \TeX82.
The main purpose of the following program is to explain the algorithms of \MF\
as clearly as possible. As a result, the program will not necessarily be very
efficient when a particular \PASCAL\ compiler has translated it into a
particular machine language. However, the program has been written so that it
can be tuned to run efficiently in a wide variety of operating environments
by making comparatively few changes. Such flexibility is possible because
the documentation that follows is written in the \.{WEB} language, which is
at a higher level than \PASCAL; the preprocessing step that converts \.{WEB}
to \PASCAL\ is able to introduce most of the necessary refinements.
Semi-automatic translation to other languages is also feasible, because the
program below does not make extensive use of features that are peculiar to
\PASCAL.
A large piece of software like \MF\ has inherent complexity that cannot
be reduced below a certain level of difficulty, although each individual
part is fairly simple by itself. The \.{WEB} language is intended to make
the algorithms as readable as possible, by reflecting the way the
individual program pieces fit together and by providing the
cross-references that connect different parts. Detailed comments about
what is going on, and about why things were done in certain ways, have
been liberally sprinkled throughout the program. These comments explain
features of the implementation, but they rarely attempt to explain the
\MF\ language itself, since the reader is supposed to be familiar with
{\sl The METAFONTbook}.
@.WEB@>
@:METAFONTbook}{\sl The METAFONTbook@>
@ The present implementation has a long ancestry, beginning in the spring
of~1977, when its author wrote a prototype set of subroutines and macros
@↑Knuth, Donald Ervin@>
that were used to develop the first Computer Modern fonts.
This original proto-\MF\ required the user to recompile a {\mc SAIL} program
whenever any character was changed, because it was not a ``language'' for
font design; the language was {\mc SAIL}. After several hundred characters
had been designed in that way, the author developed an interpretable language
called \MF, in which it was possible to express the Computer Modern programs
less cryptically. A complete \MF\ processor was designed and coded by the
author in 1979. This program, written in {\mc SAIL}, was adapted for use
with a variety of typesetting equipment and display terminals by Leo Guibas,
Lyle Ramshaw, and David Fuchs.
@↑Guibas, Leonidas Ioannis@>
@↑Ramshaw, Lyle Harold@>
@↑Fuchs, David Raymond@>
Major improvements to the design of Computer Modern fonts were made in the spring
of 1981, after which it became clear that a new language would better express
the needs of letterform designers. Therefore an entirely new \MF\ language and
system were designed in 1984; the present system retains the name and some of
the spirit of \MF79, but all of the details have changed.
No doubt there still is plenty of room for enhancements, but the author
is firmly committed to keeping \MF84 ``frozen'' from now on; stability
and reliability are to be its main virtues.
On the other hand, the \.{WEB} description can be extended without changing
the core of \MF84 itself, and the program has been designed so that such
extensions are not extremely difficult to make.
The |banner| string defined here should be changed whenever \MF\
undergoes any modifications, so that it will be clear which version of
\MF\ might be the guilty party when a problem arises.
@↑extensions to \MF@>
@↑system dependencies@>
@d banner=='This is METAFONT, Version -100.0' {printed when \MF\ starts}
@ Different \PASCAL s have slightly different conventions, and the present
\def\ph{{\mc PASCAL-H}}%
@!@:PASCAL H}{\ph@>
program expresses \MF\ in terms of the \PASCAL\ that was
available to the author in 1984. The methods used here to work with
this particular compiler, which we shall call \ph, should help the
reader to see how to make an appropriate interface for other systems
if necessary. (\ph\ is Charles Hedrick's modification of a compiler
@↑Hedrick, Charles Locke@>
for the DECsystem-10 that was originally developed at the University of
Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976),
29--42. The \MF\ program below is intended to be adaptable, without
extensive changes, to most other versions of \PASCAL, so it does not fully
use the admirable features of \ph. Indeed, a conscious effort has been
made here to avoid using several idiosyncratic features of standard
\PASCAL\ itself, so that most of the code can be mechanically translated
into other high-level languages. For example, the `\&{with}' and `\\{new}'
features are not used, nor are pointer types, set types, or enumerated
scalar types; there are no `\&{var}' parameters, except in the case of files;
there are no tag fields on variant records; there are no |real| variables;
no procedures are declared local to other procedures.)
The portions of this program that involve system-dependent code, where
changes might be necessary because of differences between \PASCAL\ compilers
and/or differences between
operating systems, can be identified by looking at the sections whose
numbers are listed under `system dependencies' in the index. Furthermore,
the index entries for `dirty \PASCAL' list all places where the restrictions
of \PASCAL\ have not been followed perfectly, for one reason or another.
@!@↑system dependencies@>
@!@↑dirty \PASCAL@>
@ The program begins with a normal \PASCAL\ program heading, whose
components will be filled in later, using the conventions of \.{WEB}.
@.WEB@>
For example, the portion of the program called `\X\glob:Global
variables\X' here will be replaced by a sequence of variable declarations
that starts in $\section\glob$ of this documentation. In this way, we are able
to define each individual global variable when we are prepared to
understand what it means; we do not have to define all of the globals at
once. Cross references in $\section\glob$, where it says ``See also
sections \gglob, \dots,'' also make it possible to look at the set of
all global variables, if desired. Similar remarks apply to the other
portions of the program heading.
Actually the heading shown here is not quite normal: The |program| line
does not mention any |output| file, because \ph\ would ask the \MF\ user
to specify a file name if |output| were specified here.
@↑system dependencies@>
@d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:}
@f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'}
@f type==true {but `|type|' will not be treated as a reserved word}
@p @t\4@>@<Compiler directives@>@/
program MF; {all file names are defined dynamically}
label @<Labels in the outer block@>@/
const @<Constants in the outer block@>@/
mtype @<Types in the outer block@>@/
var @<Global variables@>@/
@#
procedure initialize; {this procedure gets things started properly}
var @<Local variables for initialization@>@/
begin @<Initialize whatever \MF\ might access@>@;
end;@#
@t\4@>@<Basic printing procedures@>@/
@t\4@>@<Error handling procedures@>@/
@ The overall \MF\ program begins with the heading just shown, after which
comes a bunch of procedure declarations and function declarations.
Finally we will get to the main program, which begins with the
comment `|start_here|'. If you want to skip down to the
main program now, you can look up `|start_here|' in the index.
But the author suggests that the best way to understand this program
is to follow pretty much the order of \MF's components as they appear in the
\.{WEB} description you are now reading, since the present ordering is
intended to combine the advantages of the ``bottom up'' and ``top down''
approaches to the problem of understanding a somewhat complicated system.
@ Three labels must be declared in the main program, so we give them
symbolic names.
@d start_of_MF=1 {go here when \MF's variables are initialized}
@d end_of_MF=9998 {go here to close files and terminate gracefully}
@d final_end=9999 {this label marks the ending of the program}
@<Labels in the out...@>=
start_of_MF@t\hskip-2pt@>, end_of_MF@t\hskip-2pt@>,@,final_end;
{key control points}
@ Some of the code below is intended to be used only when diagnosing the
strange behavior that sometimes occurs when \MF\ is being installed or
when system wizards are fooling around with \MF\ without quite knowing
what they are doing. Such code will not normally be compiled; it is
delimited by the codewords `$|debug|\ldots|gubed|$', with apologies
to people who wish to preserve the purity of English. Similarly, there
is some conditional code delimited by `$|stat|\ldots|tats|$'
that is intended only for use when statistics
are to be kept about \MF's memory usage.
@↑debugging@>
@d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging}
@d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging}
@f debug==begin
@f gubed==end
@#
@d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering
usage statistics}
@d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering
usage statistics}
@f stat==begin
@f tats==end
@ This program has two important variations: (1) There is a long and slow
version called \.{INIMF}, which does the extra calculations needed to
@.INIMF@>
initialize \MF's internal tables; and (2)~there is a shorter and faster
production version, which cuts the initialization to a bare minimum.
Parts of the program that are needed in (1) but not in (2) are delimited by
the codewords `$|init|\ldots|tini|$'.
@d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version}
@d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version}
@f init==begin
@f tini==end
@<Initialize whatever...@>=
@<Set initial values of key variables@>@/
@!init @<Initialize table entries (done by \.{INIMF} only)@>@;@+tini
@ If the first character of a \PASCAL\ comment is a dollar sign,
\ph\ treats the comment as a list of ``compiler directives'' that will
affect the translation of this program into machine language. The
directives shown below specify full checking and inclusion of the \PASCAL\
debugger when \MF\ is being debugged, but they cause range checking and other
redundant code to be eliminated when the production system is being generated.
Arithmetic overflow will be detected in all cases.
@↑system dependencies@>
@↑Overflow in arithmetic@>
@<Compiler directives@>=
@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead}
@!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging}
@ This \MF\ implementation conforms to the rules of the {\sl PASCAL User
@:PASCAL}{\PASCAL@>
@↑system dependencies@>
Manual} published by Jensen and Wirth in 1975, except where system-dependent
@↑Wirth, Niklaus@>
@↑Jensen, Kathleen@>
code is necessary to make a useful system program, and except in another
respect where such conformity would unnecessarily obscure the meaning
and clutter up the code: We assume that |case| statements may include a
default case that applies if no matching label is found. Thus, we shall use
constructions like
$$\vbox{\halign{\ignorespaces#\hfil\cr
|case x of|\cr
1: $\langle\,$code for $x=1\,\rangle$;\cr
3: $\langle\,$code for $x=3\,\rangle$;\cr
|othercases| $\langle\,$code for |x≠1| and |x≠3|$\,\rangle$\cr
|endcases|\cr}}$$
since most \PASCAL\ compilers have plugged this hole in the language by
incorporating some sort of default mechanism. For example, the \ph\
compiler allows `|others|:' as a default label, and other \PASCAL s allow
syntaxes like `\ignorespaces|else|\unskip' or `\\{otherwise}' or
`\\{otherwise}:', etc. The definitions of |othercases| and |endcases|
should be changed to agree with local conventions. Note that no semicolon
appears before |endcases| in this program, so the definition of |endcases|
should include a semicolon if the compiler wants one. (Of course, if no
default mechanism is available, the |case| statements of \MF\ will have
to be laboriously extended by listing all remaining cases. People who are
stuck with such \PASCAL s have in fact done this, successfully but not
happily!)
@d othercases == others: {default for cases not listed explicitly}
@d endcases == @+end {follows the default case in an extended |case| statement}
@f othercases == else
@f endcases == end
@ The following parameters can be changed at compile time to extend or
reduce \MF's capacity. They may have different values in \.{INIMF} and
in production versions of \MF.
@.INIMF@>
@↑system dependencies@>
@<Constants...@>=
@!mem_max=30000; {greatest index in \MF's internal |mem| array,
must be strictly less than |max_halfword|}
@!buf_size=500; {maximum number of characters simultaneously present in
current lines of open files; must not exceed |max_halfword|}
@!error_line=72; {width of context lines on terminal error messages}
@!half_error_line=42; {width of first lines of contexts in terminal
error messages, should be between 30 and |error_line-15|}
@!max_print_line=79; {width of longest text lines output, should be at least 60}
@!stack_size=30; {maximum number of simultaneous input sources}
@!max_in_open=6; {maximum number of input files and error insertions that
can be going on simultaneously}
@!param_size=30; {maximum number of simultaneous macro parameters}
@!max_strings=1500; {maximum number of strings; must not exceed |max_halfword|}
@!string_vacancies=8000; {the minimum number of characters that should be
available for the user's identifier names,
after \MF's own error messages are stored}
@!pool_size=32000; {maximum number of characters in strings, including all
error messages and help texts, and the names of all identifiers;
must exceed |string_vacancies| by the total
length of \MF's own strings, which is currently about λλλ}
@!save_size=200; {space for saving values outside of current group, must be
at most |max_halfword|}
@!dvi_buf_size=800; {size of the output buffer, must be a multiple of 8}
@!file_name_size=40; {file names shouldn't be longer than this}
@!pool_name='TeXformats:MF.POOL ';
{string of length |file_name_size|, tells where the string pool appears}
@.TeXformats@>
@!path_size=100; {maximum number of knots between breakpoints of a path}
@ Like the preceding parameters, the following quantities can be changed
at compile time to extend or reduce \MF's capacity. But if they are changed,
it is necessary to rerun the initialization program \.{INIMF}
@.INIMF@>
to generate new tables for the production \MF\ program.
One can't simply make helter-skelter changes to the following constants,
since certain rather complex initialization
numbers are computed from them. They are defined here using
\.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to
emphasize this distinction.
@d mem_base=0 {smallest index in the |mem| array, must not be less
than |min_halfword|}
@d hi_mem_base=13000 {smallest index in the single-word area of |mem|,
must be substantially larger than |mem_base| and smaller than |mem_max|}
@d hash_size=2100 {maximum number of identifiers}
@d hash_prime=1777 {a prime number equal to about 85\% of |hash_size|}
@↑system dependencies@>
@ In case somebody has inadvertently made bad settings of the ``constants,''
\MF\ checks them using a global variable called |bad|.
This is the first of many sections of \MF\ where global variables are
defined.
@<Glob...@>=
@!bad:integer; {is some ``constant'' wrong?}
@ Later on we will say `\ignorespaces|if mem_max≥max_halfword then bad←10|',
or something similar. (We can't do that until |max_halfword| has been defined.)
@<Check the ``constant'' values for consistency@>=
bad←0;
if (half_error_line<30)∨(half_error_line>error_line-15) then bad←1;
if max_print_line<60 then bad←2;
if dvi_buf_size mod 8≠0 then bad←3;
if (hi_mem_base<mem_base+100)∨(hi_mem_base+100>mem_max) then bad←4;
if hash_prime>hash_size then bad←5;
@ Labels are given symbolic names by the following definitions, so that
occasional |goto| statements will be meaningful. We insert the label
`|exit|:' just before the `\ignorespaces|end|\unskip' of a procedure in
which we have used the `|return|' statement defined below; the label
`|restart|' is occasionally used at the very beginning of a procedure; and
the label `|reswitch|' is occasionally used just prior to a |case|
statement in which some cases change the conditions and we wish to branch
to the newly applicable case. Loops that are set up with the |loop|
construction defined below are commonly exited by going to `|done|' or to
`|found|' or to `|not_found|', and they are sometimes repeated by going to
`|continue|'. If two or more parts of a subroutine start differently but
end up the same, the shared code may be gathered together at
`|common_ending|'.
Incidentally, this program never declares a label that isn't actually used,
because some fussy \PASCAL\ compilers will complain about redundant labels.
@d exit=10 {go here to leave a procedure}
@d restart=20 {go here to start a procedure again}
@d reswitch=21 {go here to start a case statement again}
@d continue=22 {go here to resume a loop}
@d done=30 {go here to exit a loop}
@d done1=31 {like |done|, when there is more than one loop}
@d done2=32 {for exiting the second loop in a long block}
@d done3=33 {for exiting the third loop in a very long block}
@d done4=34 {for exiting the fourth loop in an extremely long block}
@d done5=35 {for exiting the fifth loop in an immense block}
@d done6=36 {for exiting the sixth loop in a block}
@d found=40 {go here when you've found it}
@d found1=41 {like |found|, when there's more than one per routine}
@d found2=42 {like |found|, when there's more than two per routine}
@d not_found=45 {go here when you've found nothing}
@d common_ending=50 {go here when you want to merge with another branch}
@ Here are some macros for common programming idioms.
@d incr(#) == #←#+1 {increase a variable by unity}
@d decr(#) == #←#-1 {decrease a variable by unity}
@d negate(#) == #←-# {change the sign of a variable}
@d loop == @+ while true do@+ {repeat over and over until a |goto| happens}
@f loop == xclause
{\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'}
@d do_nothing == {empty statement}
@d return == goto exit {terminate a procedure call}
@f return == nil
@d empty=0 {symbolic name for a null constant}
@* \[2] The character set.
In order to make \MF\ readily portable between a wide variety of
computers, all of its input text is converted to an internal seven-bit
code that is essentially standard ASCII, the ``American Standard Code for
Information Interchange.'' This conversion is done immediately when each
character is read in. Conversely, characters are converted from ASCII to
the user's external representation just before they are output to a
text file.
@↑ASCII code@>
Such an internal code is relevant to users of \MF\ only with respect to
constants that begin with a reverse apostrophe.
@ Characters of text that have been converted to \MF's internal form
are said to be of type |ASCII_code|, which is a subrange of the integers.
@<Types...@>=
@!ASCII_code=0..127; {seven-bit numbers}
@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit
character sets were common, so it did not make provision for lowercase
letters. Nowadays, of course, we need to deal with both capital and small
letters in a convenient way, especially in a program for font design;
so the present specification of \MF\ has been written under the assumption
that the \PASCAL\ compiler and run-time system permit the use of text files
with more than 64 distinguishable characters. More precisely, we assume that
the character set contains at least the letters and symbols associated
with ASCII codes @'40 through @'176; all of these characters are now
available on most computer terminals.
Since we are dealing with more characters than were present in the first
\PASCAL\ compilers, we have to decide what to call the associated data
type. Some \PASCAL s use the original name |char| for the
characters in text files, even though there now are more than 64 such
characters, while other \PASCAL s consider |char| to be a 64-element
subrange of a larger data type that has some other name.
In order to accommodate this difference, we shall use the name |text_char|
to stand for the data type of the characters that are converted to and
from |ASCII_code| when they are input and output. We shall also assume
that |text_char| consists of the elements |chr(first_text_char)| through
|chr(last_text_char)|, inclusive. The following definitions should be
adjusted if necessary.
@↑system dependencies@>
@d text_char == char {the data type of characters in text files}
@d first_text_char=0 {ordinal number of the smallest element of |text_char|}
@d last_text_char=127 {ordinal number of the largest element of |text_char|}
@<Local variables for init...@>=
i:0..last_text_char;
@ The \MF\ processor converts between ASCII code and
the user's external character set by means of arrays |xord| and |xchr|
that are analogous to \PASCAL's |ord| and |chr| functions.
@<Glob...@>=
@!xord: array [text_char] of ASCII_code;
{specifies conversion of input characters}
@!xchr: array [ASCII_code] of text_char;
{specifies conversion of output characters}
@ Since we are assuming that our \PASCAL\ system is able to read and write the
visible characters of standard ASCII (although not necessarily using the
ASCII codes to represent them), the following assignment statements initialize
most of the |xchr| array properly, without needing any system-dependent
changes. On the other hand, it is possible to implement \MF\ with
less complete character sets, and in such cases it will be necessary to
change something here.
@↑system dependencies@>
@<Set init...@>=
xchr[@'40]←' ';
xchr[@'41]←'!';
xchr[@'42]←'"';
xchr[@'43]←'#';
xchr[@'44]←'$';
xchr[@'45]←'%';
xchr[@'46]←'&';
xchr[@'47]←'''';@/
xchr[@'50]←'(';
xchr[@'51]←')';
xchr[@'52]←'*';
xchr[@'53]←'+';
xchr[@'54]←',';
xchr[@'55]←'-';
xchr[@'56]←'.';
xchr[@'57]←'/';@/
xchr[@'60]←'0';
xchr[@'61]←'1';
xchr[@'62]←'2';
xchr[@'63]←'3';
xchr[@'64]←'4';
xchr[@'65]←'5';
xchr[@'66]←'6';
xchr[@'67]←'7';@/
xchr[@'70]←'8';
xchr[@'71]←'9';
xchr[@'72]←':';
xchr[@'73]←';';
xchr[@'74]←'<';
xchr[@'75]←'=';
xchr[@'76]←'>';
xchr[@'77]←'?';@/
xchr[@'100]←'@@';
xchr[@'101]←'A';
xchr[@'102]←'B';
xchr[@'103]←'C';
xchr[@'104]←'D';
xchr[@'105]←'E';
xchr[@'106]←'F';
xchr[@'107]←'G';@/
xchr[@'110]←'H';
xchr[@'111]←'I';
xchr[@'112]←'J';
xchr[@'113]←'K';
xchr[@'114]←'L';
xchr[@'115]←'M';
xchr[@'116]←'N';
xchr[@'117]←'O';@/
xchr[@'120]←'P';
xchr[@'121]←'Q';
xchr[@'122]←'R';
xchr[@'123]←'S';
xchr[@'124]←'T';
xchr[@'125]←'U';
xchr[@'126]←'V';
xchr[@'127]←'W';@/
xchr[@'130]←'X';
xchr[@'131]←'Y';
xchr[@'132]←'Z';
xchr[@'133]←'[';
xchr[@'134]←'\';
xchr[@'135]←']';
xchr[@'136]←'↑';
xchr[@'137]←'_';@/
xchr[@'140]←'`';
xchr[@'141]←'a';
xchr[@'142]←'b';
xchr[@'143]←'c';
xchr[@'144]←'d';
xchr[@'145]←'e';
xchr[@'146]←'f';
xchr[@'147]←'g';@/
xchr[@'150]←'h';
xchr[@'151]←'i';
xchr[@'152]←'j';
xchr[@'153]←'k';
xchr[@'154]←'l';
xchr[@'155]←'m';
xchr[@'156]←'n';
xchr[@'157]←'o';@/
xchr[@'160]←'p';
xchr[@'161]←'q';
xchr[@'162]←'r';
xchr[@'163]←'s';
xchr[@'164]←'t';
xchr[@'165]←'u';
xchr[@'166]←'v';
xchr[@'167]←'w';@/
xchr[@'170]←'x';
xchr[@'171]←'y';
xchr[@'172]←'z';
xchr[@'173]←'{';
xchr[@'174]←'|';
xchr[@'175]←'}';
xchr[@'176]←'~';@/
xchr[0]←' '; xchr[@'177]←' ';
{ASCII codes 0 and |@'177| do not appear in text}
@ Some of the ASCII codes without visible characters have been given symbolic
names in this program because they are used with a special meaning.
@d null_code=@'0 {ASCII code that might disappear}
@d carriage_return=@'15 {ASCII code used at end of line}
@d invalid_code=@'177 {ASCII code that should not appear}
@ The ASCII code is ``standard'' only to a certain extent, since many
computer installations have found it advantageous to have ready access
to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/}
gives a complete specification of the intended correspondence between
characters and \MF's internal representation.
@:TeXbook}{\sl The TeXbook@>
If \MF\ is being used
on a garden-variety \PASCAL\ for which only standard ASCII
codes will appear in the input and output files, it doesn't really matter
what codes are specified in |xchr[1..@'37]|, but the safest policy is to
blank everything out by using the code shown below.
However, other settings of |xchr| will make \MF\ more friendly on
computers that have an extended character set, so that users can type things
like `\.↑↑Z' instead of `\.{\\ne}'. At MIT, for example, it would be more
appropriate to substitute the code
$$\hbox{|for i←1 to @'37 do xchr[i]←chr(i);|}$$
\MF's character set is essentially the same as MIT's, even with respect to
characters less than~@'40. People with extended character sets can
assign codes arbitrarily, giving an |xchr| equivalent to whatever
characters the users of \MF\ are allowed to have in their input files.
It is best to make the codes correspond to the intended interpretations as
shown in Appendix~C whenever possible; but this is not necessary. For
example, in countries with an alphabet of more than 26 letters, it is
usually best to map the additional letters into codes less than~@'40.
Appropriate changes to \MF's category code table should then be made.
(Unlike \TeX, each installation of \MF\ has a fixed assignment of category
codes.)
@↑character set dependencies@>
@↑system dependencies@>
@<Set init...@>=
for i←1 to @'37 do xchr[i]←' ';
@ The following system-independent code makes the |xord| array contain a
suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]|
where |i<j<@'177|, the value of |xord[xchr[i]]| will turn out to be
|j| or more; hence, standard ASCII code numbers will be used instead of
codes below @'40 in case there is a coincidence.
@<Set init...@>=
for i←first_text_char to last_text_char do xord[chr(i)]←invalid_code;
for i←1 to @'176 do xord[xchr[i]]←i;
@* \[3] Input and output.
The bane of portability is the fact that different operating systems treat
input and output quite differently, perhaps because computer scientists
have not given sufficient attention to this problem. People have felt somehow
that input and output are not a part of ``real'' programming. Well, it is true
that some kinds of programming are more fun than others. With existing
input/output conventions being so diverse and so messy, the only sources of
joy in such parts of the code are the rare occasions when one can find a
way to make the program a little less bad than it might have been. We have
two choices: whether to attack I/O now and get it over with, or to postpone
it until near the end. Neither prospect is very attractive, so let's
get it over with.
The basic operations we need to do are (1)~inputting and outputting of
text, to or from a file or the user's terminal; (2)~inputting and
outputting of eight-bit bytes, to or from a file; (3)~instructing the
operating system to initiate (``open'') or to terminate (``close'') input or
output from a specified file; (4)~testing whether the end of an input
file has been reached; (5)~display of bits on the user's screen.
The bit-display operation will be discussed in a later section; we shall
deal here only with more traditional kinds of I/O.
Note that \MF\ needs to deal with only two kinds of files.
We shall use the term |alpha_file| for a file that contains textual data,
and the term |byte_file| for a file that contains eight-bit binary information.
These two types turn out to be the same on many computers, but
sometimes there is a significant distinction, so we shall be careful to
distinguish between them. Standard protocols for transferring
such files from computer to computer, via high-speed networks, are
now becoming available to more and more communities of users.
\MF\ actually makes use also of a third kind of file, called a |word_file|,
when dumping and reloading package information for its own initialization.
We shall define a word file later; but it will be possible for us to
do a few simple things with them in this section before they are defined.
@<Types...@>=
@!eight_bits=0..255; {unsigned one-byte quantity}
@!alpha_file=packed file of text_char; {files that contain textual data}
@!byte_file=packed file of eight_bits; {files that contain binary data}
@ Most of what we need to do with respect to input and output can be handled
by the I/O facilities that are standard in \PASCAL, i.e., the routines
called |get|, |put|, |eof|, and so on. But
standard \PASCAL\ does not allow file variables to be associated with file
names that are determined at run time, so it cannot be used to implement
\MF; some sort of extension to \PASCAL's ordinary |reset| and |rewrite|
is crucial for our purposes. We shall assume that |name_of_file| is a variable
of an appropriate type such that the \PASCAL\ run-time system being used to
implement \MF\ can open a file whose external name is specified by
|name_of_file|.
@↑system dependencies@>
@<Glob...@>=
@!name_of_file:packed array[1..file_name_size] of char;@;@/
{on some systems this may be a \&{record} variable}
@!name_length:0..file_name_size;@/{this many characters are actually
relevant in |name_of_file| (the rest are blank)}
@ The \ph\ compiler with which the present version of \MF\ was prepared has
extended the rules of \PASCAL\ in a very convenient way. To open file~|f|,
we can write
$$\vbox{\halign{#\hfil\qquad\hfil\cr
|reset(f,@t\\{name}@>,'/O')|&for input;\cr
|rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$
The `\\{name}' parameter, which is of type `\ignorespaces|packed
array[@t\<\\{any}>@>] of text_char|', stands for the name of
the external file that is being opened for input or output.
Blank spaces that might appear in \\{name} are ignored.
The `\.{/O}' parameter tells the operating system not to issue its own
error messages if something goes wrong. If a file of the specified name
cannot be found, or if such a file cannot be opened for some other reason
(e.g., someone may already be trying to write the same file), we will have
|@!erstat(f)≠0| after an unsuccessful |reset| or |rewrite|. This allows
\MF\ to undertake appropriate corrective action.
@:PASCAL H}{\ph@>
@↑system dependencies@>
\MF's file-opening procedures return |false| if no file identified by
|name_of_file| could be opened.
@d reset_OK(#)==erstat(#)=0
@d rewrite_OK(#)==erstat(#)=0
@p function a_open_in(var f:alpha_file):boolean;
{open a text file for input}
begin reset(f,name_of_file,'/O'); a_open_in←reset_OK(f);
end;
@#
function a_open_out(var f:alpha_file):boolean;
{open a text file for output}
begin rewrite(f,name_of_file,'/O'); a_open_out←rewrite_OK(f);
end;
@#
function b_open_in(var f:byte_file):boolean;
{open a binary file for input}
begin reset(f,name_of_file,'/O'); b_open_in←reset_OK(f);
end;
@#
function b_open_out(var f:byte_file):boolean;
{open a binary file for output}
begin rewrite(f,name_of_file,'/O'); b_open_out←rewrite_OK(f);
end;
@#
function w_open_in(var f:word_file):boolean;
{open a word file for input}
begin reset(f,name_of_file,'/O'); w_open_in←reset_OK(f);
end;
@#
function w_open_out(var f:word_file):boolean;
{open a word file for output}
begin rewrite(f,name_of_file,'/O'); w_open_out←rewrite_OK(f);
end;
@ Files can be closed with the \ph\ routine `|close(f)|', which
@↑system dependencies@>
should be used when all input or output with respect to |f| has been completed.
This makes |f| available to be opened again, if desired; and if |f| was used for
output, the |close| operation makes the corresponding external file appear
on the user's area, ready to be read.
@p procedure a_close(var f:alpha_file); {close a text file}
begin close(f);
end;
@#
procedure b_close(var f:byte_file); {close a binary file}
begin close(f);
end;
@#
procedure w_close(var f:word_file); {close a word file}
begin close(f);
end;
@ Binary input and output are done with \PASCAL's ordinary |get| and |put|
procedures, so we don't have to make any other special arrangements for
binary~I/O. Text output is also easy to do with standard \PASCAL\ routines.
The treatment of text input is more difficult, however, because
of the necessary translation to |ASCII_code| values, and because
\MF's conventions should be efficient and they should
blend nicely with the user's operating environment.
@ Input from text files is read one line at a time, using a routine called
|input_ln|. This function is defined in terms of global variables called
|buffer|, |first|, and |last| that will be described in detail later; for
now, it suffices for us to know that |buffer| is an array of |ASCII_code|
values, and that |first| and |last| are indices into this array
representing the beginning and ending of a line of text.
@<Glob...@>=
@!buffer:array[0..buf_size] of ASCII_code; {lines of characters being read}
@!first:0..buf_size; {the first unused position in |buffer|}
@!last:0..buf_size; {end of the line just input to |buffer|}
@!max_buf_stack:0..buf_size; {largest index used in |buffer|}
@ The |input_ln| function brings the next line of input from the specified
field into available positions of the buffer array and returns the value
|true|, unless the file has already been entirely read, in which case it
returns |false| and sets |last←first|. In general, the |ASCII_code|
numbers that represent the next line of the file are input into
|buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the
global variable |last| is set equal to |first| plus the length of the
line. Trailing blanks are removed from the line; thus, either |last=first|
(in which case the line was entirely blank) or |buffer[last-1]≠" "|.
An overflow error is given, however, if the normal actions of |input_ln|
would make |last≥buf_size|; this is done so that other parts of \MF\
can safely look at the contents of |buffer[last+1]| without overstepping
the bounds of the |buffer| array. Upon entry to |input_ln|, the condition
|first<buf_size| will always hold, so that there is always room for an
``empty'' line.
The variable |max_buf_stack|, which is used to keep track of how large
the |buf_size| parameter must be to accommodate the present job, is
also kept up to date by |input_ln|.
If the |bypass_eoln| parameter is |true|, |input_ln| will do a |get|
before looking at the first character of the line; this skips over
an |eoln| that was in |f↑|. The procedure does not do a |get| when it
reaches the end of the line; therefore it can be used to acquire input
from the user's terminal as well as from ordinary text files.
Standard \PASCAL\ says that a file should have |eoln| immediately
before |eof|, but \MF\ needs only a weaker restriction: If |eof|
occurs in the middle of a line, the system function |eoln| should return
a |true| result (even though |f↑| will be undefined).
@p function input_ln(var f:alpha_file;@!bypass_eoln:boolean):boolean;
{inputs the next line or returns |false|}
var last_nonblank:0..buf_size; {|last| with trailing blanks removed}
begin if bypass_eoln then if not eof(f) then get(f);
{input the first character of the line into |f↑|}
last←first; {cf.\ Matthew 19:30}
if eof(f) then input_ln←false
else begin last_nonblank←first;
while not eoln(f) do
begin if last≥max_buf_stack then
begin max_buf_stack←last+1;
if max_buf_stack=buf_size then
overflow("buffer size",buf_size);
@:METAFONT capacity exceeded buffer size}{\quad buffer size@>
end;
buffer[last]←xord[f↑]; get(f); incr(last);
if buffer[last-1]≠" " then last_nonblank←last;
end;
last←last_nonblank; input_ln←true;
end;
end;
@ The user's terminal acts essentially like other files of text, except
that it is used both for input and for output. When the terminal is
considered an input file, the file variable is called |term_in|, and when it
is considered an output file the file variable is |term_out|.
@↑system dependencies@>
@<Glob...@>=
@!term_in:alpha_file; {the terminal as an input file}
@!term_out:alpha_file; {the terminal as an output file}
@ Here is how to open the terminal files
in \ph. The `\.{/I}' switch suppresses the first |get|.
@↑system dependencies@>
@d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input}
@d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output}
@ Sometimes it is necessary to synchronize the input/output mixture that
happens on the user's terminal, and three procedures are used for this
purpose. The first of these, |update_terminal|, is called when we want
to make sure that everything we have output to the terminal so far has
actually left the computer's internal buffers and been sent.
The second, |clear_terminal|, is called when we wish to cancel any
input that the user may have typed ahead (since we are about to
issue an unexpected error message). The third, |wake_up_terminal|,
is supposed to revive the terminal if the user has disabled it by
some instruction to the operating system. The following macros show how
these operations can be specified in \ph:
@↑system dependencies@>
@d update_terminal == break(term_out) {empty the terminal output buffer}
@d clear_terminal == break_in(term_in,true) {clear the terminal input buffer}
@d wake_up_terminal == do_nothing {cancel the user's cancellation of output}
@ We need a special routine to read the first line of \MF\ input from
the user's terminal. This line is different because it is read before we
have opened the transcript file; there is sort of a ``chicken and
egg'' problem here. If the user types `\.{input paper}' on the first
line, or if some macro invoked by that line does such an \.{input},
the transcript file will be named `\.{paper.log}'; but if no \.{input}
commands are performed during the first line of terminal input, the transcript
file will acquire its default name `\.{mfput.log}'. (The transcript file
will not contain error messages generated by the first line before the
first \.{input} command.)
The first line is even more special if we are lucky enough to have an operating
system that treats \MF\ differently from a run-of-the-mill \PASCAL\ object
program. It's nice to let the user start running a \MF\ job by typing
a command line like `\.{MF cmr10}'; in such a case, \MF\ will operate
as if the first line of input were `\.{cmr10}', i.e., the first line will
consist of the remainder of the command line, after the part that invoked \MF.
@ Different systems have different ways to get started, but regardless of
what conventions are adopted the routine that initializes the terminal
should satisfy the following specifications:
\yskip\textindent{1)}It should open file |term_in| for input from the
terminal. (The file |term_out| will already be open for output to the
terminal.)
\textindent{2)}If the user has given a command line, this line should be
considered the first line of terminal input. Otherwise the
user should be prompted with `\.{**}', and the first line of input
should be whatever is typed in response.
\textindent{3)}The first line of input, which might or might not be a
command line, should appear in locations |first| to |last-1| of the
|buffer| array.
\textindent{4)}The global variable |loc| should be set so that the
character that \MF\ reads next is in |buffer[loc]|. This
character should not be blank, and we should have |loc<last|.
\yskip\noindent(It may be necessary to prompt the user several times
before a non-blank line comes in. The prompt is `\.{**}' instead of the
later `\.*' because the meaning is slightly different: `\.{input}' need
not be typed immediately after~`\.{**}'.)
@d loc==cur_input.loc_field {location of first unread character in |buffer|}
@ The following program does the required initialization
without retrieving a possible command line.
It should be clear how to modify this routine to deal with command lines,
if the system permits them.
@↑system dependencies@>
@p function init_terminal:boolean; {gets the terminal input started}
label exit;
begin t_open_in;
loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal;
@.**@>
if not input_ln(term_in,true) then {this shouldn't happen}
begin write_ln(term_out);
write(term_out,'! End of file on the terminal... why?');
@.End of file on the terminal@>
init_terminal←false; return;
end;
loc←first;
while (loc<last)∧(buffer[loc]=" ") do incr(loc);
if loc<last then
begin init_terminal←true;
return; {return unless the line was all blank}
end;
write_ln(term_out,'Please type the name of your input file.');
end;
exit:end;
@* \[4] String handling.
Control sequence names and diagnostic messages are variable-length strings
of seven-bit characters. Since \PASCAL\ does not have a well-developed string
mechanism, \MF\ does all of its string processing by homegrown methods.
Elaborate facilities for dynamic strings are not needed, so all of the
necessary operations can be handled with a simple data structure.
The array |str_pool| contains all of the (seven-bit) ASCII codes in all
of the strings, and the array |str_start| contains indices of the starting
points of each string. Strings are referred to by integer numbers, so that
string number |s| comprises the characters |str_pool[j]| for
|str_start[s]≤j<str_start[s+1]|. Additional integer variables
|pool_ptr| and |str_ptr| indicate the number of entries used so far
in |str_pool| and |str_start|, respectively; locations
|str_pool[pool_ptr]| and |str_start[str_ptr]| are
ready for the next string to be allocated.
String numbers 0 to 127 are reserved for strings that correspond to single
ASCII characters. This is in accordance with the conventions of \.{WEB},
@.WEB@>
which converts single-character strings into the ASCII code number of the
single character involved, while it converts other strings into integers
and builds a string pool file. Thus, when the string constant \.{"."} appears
in the program below, \.{WEB} converts it into the integer 46, which is the
ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"}
into some integer greater than~127. String number 46 will presumably be the
single character `\..'; but some ASCII codes have no standard visible
representation, and \MF\ sometimes needs to be able to print an arbitrary
ASCII character, so the first 128 strings are used to specify exactly what
should be printed for each of the 128 possibilities.
Elements of the |str_pool| array must be ASCII codes that can actually be
printed; i.e., they must have an |xchr| equivalent in the local
character set.
@<Types...@>=
@!pool_pointer = 0..pool_size; {for variables that point into |str_pool|}
@!str_number = 0..max_strings; {for variables that point into |str_start|}
@ @<Glob...@>=
@!str_pool:packed array[pool_pointer] of ASCII_code; {the characters}
@!str_start : array[str_number] of pool_pointer; {the starting pointers}
@!pool_ptr : pool_pointer; {first unused position in |str_pool|}
@!str_ptr : str_number; {start of the current string being created}
@!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|}
@!init_str_ptr : str_number; {the starting value of |str_ptr|}
@ Several of the elementary string operations are performed using \.{WEB}
macros instead of using \PASCAL\ procedures, because many of the
operations are done quite frequently and we want to avoid the
overhead of procedure calls. For example, here is
a simple macro that computes the length of a string.
@.WEB@>
@d length(#)==(str_start[#+1]-str_start[#]) {the number of characters
in string number \#}
@ The length of the current string is called |cur_length|:
@d cur_length == (pool_ptr - str_start[str_ptr])
@ Strings are created by appending character codes to |str_pool|.
The macro called |append_char|, defined here, does not check to see if the
value of |pool_ptr| has gotten too high; this test is supposed to be
made before |append_char| is used. There is also a |flush_char|
macro, which erases the last character appended.
To test if there is room to append |l| more characters to |str_pool|,
we shall write |str_room(l)|, which aborts \MF\ and gives an
apologetic error message if there isn't enough room.
@d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|}
begin str_pool[pool_ptr]←#; incr(pool_ptr);
end
@d flush_char == decr(pool_ptr) {forget the last character in the pool}
@d str_room(#) == {make sure that the pool hasn't overflowed}
begin if pool_ptr+# > pool_size then
overflow("pool size",pool_size-init_pool_ptr);
@:METAFONT capacity exceeded pool size}{\quad pool size@>
end
@ Once a sequence of characters has been appended to |str_pool|, it
officially becomes a string when the function |make_string| is called.
This function returns the identification number of the new string as its
value.
@p function make_string : str_number; {current string enters the pool}
begin if str_ptr=max_strings then
overflow("number of strings",max_strings-init_str_ptr);
@:METAFONT capacity exceeded number of strings}{\quad number of strings@>
incr(str_ptr); str_start[str_ptr]←pool_ptr;
make_string←str_ptr-1;
end;
@ To destroy the most recently made string, we say |flush_string|.
@d flush_string==begin decr(str_ptr); pool_ptr←str_start[str_ptr];
end
@ The following subroutine compares string |s| with another string of the
same length that appears in |buffer| starting at position |k|;
the result is |true| if and only if the strings are equal.
@p function str_eq_buf(@!s:str_number;@!k:integer):boolean;
{test equality of strings}
label not_found; {loop exit}
var j: pool_pointer; {running index}
@!result: boolean; {result of comparison}
begin j←str_start[s];
while j<str_start[s+1] do
begin if str_pool[j]≠buffer[k] then
begin result←false; goto not_found;
end;
incr(j); incr(k);
end;
result←true;
not_found: str_eq_buf←result;
end;
@ Here is a similar routine, but it compares two strings in the string pool,
and it does not assume that they have the same length.
@p function str_eq_str(@!s,@!t:str_number):boolean;
{test equality of strings}
label not_found; {loop exit}
var j,@!k: pool_pointer; {running indices}
@!result: boolean; {result of comparison}
begin result←false;
if length(s)≠length(t) then goto not_found;
j←str_start[s]; k←str_start[t];
while j<str_start[s+1] do
begin if str_pool[j]≠str_pool[k] then goto not_found;
incr(j); incr(k);
end;
result←true;
not_found: str_eq_str←result;
end;
@ The initial values of |str_pool|, |str_start|, |pool_ptr|,
and |str_ptr| are computed by the \.{INIMF} program, based in part
on the information that \.{WEB} has output while processing \MF.
@.INIMF@>
@↑string pool@>
@p @!init function get_strings_started:boolean; {initializes the string pool,
but returns |false| if something goes wrong}
label done,exit;
var k,@!l:0..127; {small indices or counters}
@!m,@!n:text_char; {characters input from |pool_file|}
@!g:str_number; {garbage}
@!a:integer; {accumulator for check sum}
@!c:boolean; {check sum has checked}
begin pool_ptr←0; str_ptr←0; str_start[0]←0;
@<Make the first 128 strings@>;
@<Read the other strings from the \.{MF.POOL} file and return |true|,
or give an error message and return |false|@>;
exit:end;
tini
@ @<Make the first 128...@>=
for k←0 to 127 do
begin if (@<Character |k| cannot be printed@>) then
begin append_char("↑"); append_char("↑");
if k<@'100 then append_char(k+@'100)
else append_char(k-@'100);
end
else append_char(k);
g←make_string;
end
@ The first 128 strings will contain 95 standard ASCII characters, and the
other 33 characters will be printed in three-symbol form like `\.{\↑\↑A}'
unless a system-dependent change is made here. Installations that have
an extended character set, where for example |xchr[@'32]=@t\.{\'↑↑Z\'}@>|,
would like string @'32 to be the single character @'32 instead of the
three characters @'136, @'136, @'132 (\.{\↑\↑Z}). On the other hand,
even people with an extended character set will want to represent string
@'15 by \.{\↑\↑M}, since @'15 is |carriage_return|; the idea is to
produce visible strings instead of tabs or line-feeds or carriage-returns
or bell-rings or characters that are treated anomalously in text files.
The boolean expression defined here should be |true| unless \MF\ internal
code number~|k| corresponds to a non-troublesome visible symbol in the
local character set. At MIT, for example, the appropriate formula would
be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177]|'. If character |k| cannot be
printed, then character |k+@'100| or |k-@'100| must be printable; thus, at
least 64 printable characters are needed.
@↑character set dependencies@>
@↑system dependencies@>
@<Character |k| cannot be printed@>=
(k<" ")∨(k>"~")
@ When the \.{WEB} system program called \.{TANGLE} processes the \.{MF.WEB}
description that you are now reading, it outputs the \PASCAL\ program
\.{MF.PAS} and also a string pool file called \.{MF.POOL}. The \.{INIMF}
@.WEB@>@.INIMF@>
program reads the latter file, where each string appears as a two-digit decimal
length followed by the string itself, and the information is recorded in
\MF's string memory.
@<Glob...@>=
@!init @!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}}
tini
@ @d bad_pool(#)==begin wake_up_terminal; write_ln(term_out,#);
a_close(pool_file); get_strings_started←false; return;
end
@<Read the other strings...@>=
name_of_file←pool_name; {we needn't set |name_length|}
if a_open_in(pool_file) then
begin c←false;
repeat @<Read one string, but return |false| if the
string memory space is getting too tight for comfort@>;
until c;
a_close(pool_file); get_strings_started←true;
end
else bad_pool('! I can''t read MF.POOL.')
@.I can't read MF.POOL@>
@ @<Read one string...@>=
begin if eof(pool_file) then bad_pool('! MF.POOL has no check sum.');
@.MF.POOL has no check sum@>
read(pool_file,m,n); {read two digits of string length}
if m='*' then @<Check the pool check sum@>
else begin if (xord[m]<"0")∨(xord[m]>"9")∨@|
(xord[n]<"0")∨(xord[n]>"9") then
bad_pool('! MF.POOL line doesn''t begin with two digits.');
@.MF.POOL line doesn't...@>
l←xord[m]*10+xord[n]-"0"*11; {compute the length}
if pool_ptr+l+string_vacancies>pool_size then
bad_pool('! You have to increase POOLSIZE.');
@.You have to increase POOLSIZE@>
for k←1 to l do
begin if eoln(pool_file) then m←' '@+else read(pool_file,m);
append_char(xord[m]);
end;
read_ln(pool_file); g←make_string;
end;
end
@ The \.{WEB} operation \.{@@\$} denotes the value that should be at the
end of this \.{MF.POOL} file; any other value means that the wrong pool
file has been loaded.
@↑check sum@>
@<Check the pool check sum@>=
begin a←0; k←1;
loop@+ begin if (xord[n]<"0")∨(xord[n]>"9") then
bad_pool('! MF.POOL check sum doesn''t have nine digits.');
@.MF.POOL check sum...@>
a←10*a+xord[n]-"0";
if k=9 then goto done;
incr(k); read(pool_file,n);
end;
done: if a≠@$ then bad_pool('! MF.POOL doesn''t match; TANGLE me again.');
@.MF.POOL doesn't match@>
c←true;
end
@* \[5] On-line and off-line printing.
Messages that are sent to a user's terminal and to the transcript-log file
are produced by several `|print|' procedures. These procedures will
direct their output to a variety of places, based on the setting of
the global variable |selector|, which has the following possible
values:
\yskip
\hang |term_and_log|, the normal setting, prints on the terminal and on the
transcript file.
\hang |log_only|, prints only on the transcript file.
\hang |term_only|, prints only on the terminal.
\hang |no_print|, doesn't print at all. This is used only in rare cases
before the transcript file is open.
\hang |pseudo|, puts output into a cyclic buffer that is used
by the |show_context| routine; see that routine for the
reasoning behind this curious mode.
\hang |new_string|, appends the output to the current string in the
string pool.
\yskip
\noindent The symbolic names `|term_and_log|', etc., have been assigned
numeric codes that satisfy the convenient relations |no_print+1=term_only|,
|no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|.
Three additional global variables, |tally| and |term_offset| and
|file_offset|, record the number of characters that have been printed
since they were most recently cleared to zero. We use |tally| to record
the length of (possibly very long) stretches of printing; |term_offset|
and |file_offset|, on the other hand, keep track of how many characters
have appeared so far on the current line that has been output to the
terminal or to the transcript file, respectively.
@d no_print=0 {|selector| setting that makes data disappear}
@d term_only=1 {printing is destined for the terminal only}
@d log_only=2 {printing is destined for the transcript file only}
@d term_and_log=3 {normal |selector| setting}
@d pseudo=4 {special |selector| setting for |show_context|}
@d new_string=5 {printing is deflected to the string pool}
@d max_selector=5 {highest selector setting}
@<Glob...@>=
@!log_file : alpha_file; {transcript of \MF\ session}
@!selector : 0..max_selector; {where to print a message}
@!dig : array[0..22] of 0..15; {digits in a number being output}
@!tally : integer; {the number of characters recently printed}
@!term_offset : 0..max_print_line;
{the number of characters on the current terminal line}
@!file_offset : 0..max_print_line;
{the number of characters on the current file line}
@!trick_buf:array[0..error_line] of ASCII_code; {circular buffer for
pseudoprinting}
@!trick_count: integer; {threshold for pseudoprinting, explained later}
@!first_count: integer; {another variable for pseudoprinting}
@ @<Initialize the output routines@>=
selector←term_only; tally←0; term_offset←0; file_offset←0;
@ Macro abbreviations for output to the terminal and to the log file are
defined here for convenience. Some systems need special conventions
for terminal output, and it is possible to adhere to those conventions
by changing |wterm|, |wterm_ln|, and |wterm_cr| here.
@↑system dependencies@>
@d wterm(#)==write(term_out,#)
@d wterm_ln(#)==write_ln(term_out,#)
@d wterm_cr==write_ln(term_out)
@d wlog(#)==write(log_file,#)
@d wlog_ln(#)==write_ln(log_file,#)
@d wlog_cr==write_ln(log_file)
@ To end a line of text output, we call |print_ln|.
@<Basic print...@>=
procedure print_ln; {prints an end-of-line}
begin case selector of
term_and_log: begin wterm_cr; wlog_cr;
term_offset←0; file_offset←0;
end;
log_only: begin wlog_cr; file_offset←0;
end;
term_only: begin wterm_cr; term_offset←0;
end;
no_print,pseudo,new_string: do_nothing;
end; {there are no other cases}
end; {note that |tally| is not affected}
@ The |print_char| procedure sends one character to the desired destination,
using the |xchr| array to map it into an external character compatible with
|input_ln|. All printing comes through |print_ln| or |print_char|.
@<Basic printing...@>=
procedure print_char(@!s:ASCII_code); {prints a single character}
begin case selector of
term_and_log: begin wterm(xchr[s]); wlog(xchr[s]);
incr(term_offset); incr(file_offset);
if term_offset=max_print_line then
begin wterm_cr; term_offset←0;
end;
if file_offset=max_print_line then
begin wlog_cr; file_offset←0;
end;
end;
log_only: begin wlog(xchr[s]); incr(file_offset);
if file_offset=max_print_line then print_ln;
end;
term_only: begin wterm(xchr[s]); incr(term_offset);
if term_offset=max_print_line then print_ln;
end;
no_print: do_nothing;
pseudo: if tally<trick_count then trick_buf[tally mod error_line]←s;
new_string: begin if pool_ptr<pool_size then append_char(s);
end; {drop characters if the string space is full}
end; {there are no other cases}
incr(tally);
end;
@ An entire string is output by calling |print|. Note that if we are outputting
the single standard ASCII character \.c, we could call |print("c")|, since
|"c"=99| is the number of a single-character string, as explained above. But
|print_char("c")| is quicker, so \MF\ goes directly to the |print_char|
routine when it knows that this is safe. (The present implementation
assumes that it is always safe to print a visible ASCII character.)
@↑system dependencies@>
@<Basic print...@>=
procedure print(@!s:integer); {prints string |s|}
var j:pool_pointer; {current character code position}
begin if s≥str_ptr then s←"???" {this can't happen}
@.???@>
else if s<0 then s←"???"; {can't happen}
j←str_start[s];
while j<str_start[s+1] do
begin print_char(str_pool[j]); incr(j);
end;
end;
@ Here is the very first thing that \MF\ prints: a headline that identifies
the version number and package name. The |term_offset| variable is temporarily
incorrect, but the discrepancy is not serious since we assume that the banner
and package identifier together will occupy at most |max_print_line|
character positions.
@<Initialize the output...@>=
wterm(banner);
if pkg_ident=0 then wterm_ln(' (no package preloaded)')
else begin print(pkg_ident); print_ln;
end;
@ The procedure |print_nl| is like |print|, but it makes sure that the
string appears at the beginning of a new line.
@<Basic print...@>=
procedure print_nl(@!s:str_number); {prints string |s| at beginning of line}
begin if ((term_offset>0)∧(odd(selector)))∨@|
((file_offset>0)∧(selector≥log_only)) then print_ln;
print(s);
end;
@ An array of digits is printed by |print_digs|.
@<Basic print...@>=
procedure print_digs(@!k:eight_bits); {prints |dig[k-1]|$\,\ldots\,$|dig[0]|}
begin while k>0 do
begin decr(k);
if dig[k]<10 then print_char("0"+dig[k])
else print_char("A"-10+dig[k]);
end;
end;
@ The following procedure, which prints out the decimal representation of a
given integer |n|, has been written carefully so that it works properly
if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div|
to negative arguments, since such operations are not implemented consistently
by all \PASCAL\ compilers.
@<Basic print...@>=
procedure print_int(@!n:integer); {prints an integer in decimal form}
var k:0..20; {index to current digit; we assume that $|n|<10↑{20}$}
@!m:nonnegative_integer; {used to negate |n| in possibly dangerous cases}
begin k←0;
if n<0 then
begin print_char("-");
if n>-100000000 then negate(n)
else begin m←-1-n; n←m div 10; m←(m mod 10)+1; k←1;
if m<10 then dig[0]←m
else begin dig[0]←0; incr(n);
end;
end;
end;
repeat dig[k]←n mod 10; n←n div 10; incr(k);
until n=0;
print_digs(k);
end;
@ Here is a trivial procedure to print two digits; it is usually called with
a parameter in the range |0≤n≤99|.
@p procedure print_two(@!n:integer); {prints two least significant digits}
begin print_char("0"+((abs(n) div 10) mod 10));
print_char("0"+(abs(n) mod 10));
end;
@ Here is a procedure that asks the user to type a line of input,
assuming that the |selector| setting is either |term_only| or |term_and_log|.
The input is placed into locations |first| through |last-1| of the
|buffer| array, and echoed on the transcript file if appropriate.
This procedure is never called when |interaction<scroll_mode|.
@d prompt_input(#)==begin wake_up_terminal; print(#); term_input;
end {prints a string and gets a line of input}
@p procedure term_input; {gets a line from the terminal}
var k:0..buf_size; {index into |buffer|}
begin update_terminal; {Now the user sees the prompt for sure}
if not input_ln(term_in,true) then fatal_error("End of file on the terminal!");
@.End of file on the terminal@>
term_offset←0; {the user's line ended with \<\rm return>}
decr(selector); {prepare to echo the input}
if last≠first then for k←first to last-1 do print(buffer[k]);
print_ln; incr(selector); {restore previous status}
end;
@* \[6] Reporting errors.
When something anomalous is detected, \MF\ typically does something like this:
$$\vbox{\halign{#\hfil\cr
|print_err("Something anomalous has been detected");|\cr
|help3("This is the first line of my offer to help.")|\cr
|("This is the second line. I'm trying to")|\cr
|("explain the best way for you to proceed.");|\cr
|error;|\cr}}$$
A two-line help message would be given using |help2|, etc.; these informal
helps should use simple vocabulary that complements the words used in the
official error message that was printed. (Outside of the U.S.A., the help
messages should preferably be translated into the local vernacular. Each
line of help is at most 60 characters long, in the present implementation,
so that |max_print_line| will not be exceeded.)
The |print_err| procedure supplies a `\.!' before the official message,
and makes sure that the terminal is awake if a stop is going to occur.
The |error| procedure supplies a `\..' after the official message, then it
shows the location of the error; and if |interaction=error_stop_mode|,
it also enters into a dialog with the user, during which time the help
message may be printed.
@↑system dependencies@>
@ The global variable |interaction| has four settings, representing increasing
amounts of user interaction:
@d batch_mode=0 {omits all stops and omits terminal output}
@d nonstop_mode=1 {omits all stops}
@d scroll_mode=2 {omits error stops}
@d error_stop_mode=3 {stops at every opportunity to interact}
@d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal;
print_nl("! "); print(#);
end
@<Glob...@>=
@!interaction:batch_mode..error_stop_mode; {current level of interaction}
@ @<Set init...@>=interaction←error_stop_mode;
@ \MF\ is careful not to call |error| when the print |selector| setting
might be unusual. The only possible values of |selector| at the time of
error messages are
\yskip\hang|no_print| (when |interaction=batch_mode|
and |log_file| not yet open);
\hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open);
\hang|log_only| (when |interaction=batch_mode| and |log_file| is open);
\hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open).
@<Initialize the print |selector| based on |interaction|@>=
if interaction=batch_mode then selector←no_print@+else selector←term_only
@ A global variable |deletions_allowed| is set |false| if the |get_next|
routine is active when |error| is called; this ensures that |get_next| λλλ
and related routines like |get_token| will never be called recursively.
@↑recursion@>
The global variable |history| records the worst level of error that
has been detected. It has four possible values: |spotless|, |warning_issued|,
|error_message_issued|, and |fatal_error_stop|.
Another global variable, |error_count|, is increased by one when an
|error| occurs without an interactive dialog, and it is reset to zero at
the end of every statementλλλ. If |error_count| reaches 100, \MF\ decides
that there is no point in continuing further.
@d spotless=0 {|history| value when nothing has been amiss yet}
@d warning_issued=1 {|history| value when |begin_diagnostic| has been called}
@d error_message_issued=2 {|history| value when |error| has been called}
@d fatal_error_stop=3 {|history| value when termination was premature}
@<Glob...@>=
@!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?}
@!history:spotless..fatal_error_stop; {has the source input been clean so far?}
@!error_count:-1..100; {the number of scrolled errors since the
last statement ended}
@ The value of |history| is initially |fatal_error_stop|, but it will
be changed to |spotless| if \MF\ survives the initialization process.
@<Set init...@>=
deletions_allowed←true; error_count←0; {|history| is initialized elsewhere}
@ Since errors can be detected almost anywhere in \MF, we want to declare the
error procedures near the beginning of the program. But the error procedures
in turn use some other procedures, which need to be declared |forward|
before we get to |error| itself.
It is possible for |error| to be called recursively if some error arises
when |get_token| is being used to delete a token, or if some fatal error
occurs while \MF\ is trying to fix a non-fatal one. But such recursion
@↑recursion@>
is never more than one level deep.
@<Error handling...@>=
procedure@?normalize_selector; forward;@t\2@>@/
λλλprocedure@?get_token; forward;@t\2@>@/
procedure@?term_input; forward;@t\2@>@/
procedure@?show_context; forward;@t\2@>@/
procedure@?begin_file_reading; forward;@t\2@>@/
procedure@?open_log_file; forward;@t\2@>@/
procedure@?close_files_and_terminate; forward;@t\2@>@/
procedure@?clear_for_error_prompt; forward;@t\2@>@/
procedure@?give_err_help; forward;@t\2@>@/
@t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help;
forward;@;@+gubed
@ Individual lines of help are recorded in the array |help_line|, which
contains entries in positions |0..(help_ptr-1)|. They should be printed
in reverse order, i.e., with |help_line[0]| last.
@d hlp1(#)==help_line[0]←#;@+end
@d hlp2(#)==help_line[1]←#; hlp1
@d hlp3(#)==help_line[2]←#; hlp2
@d hlp4(#)==help_line[3]←#; hlp3
@d hlp5(#)==help_line[4]←#; hlp4
@d hlp6(#)==help_line[5]←#; hlp5
@d help0==help_ptr←0 {sometimes there might be no help}
@d help1==@+begin help_ptr←1; hlp1 {use this with one help line}
@d help2==@+begin help_ptr←2; hlp2 {use this with two help lines}
@d help3==@+begin help_ptr←3; hlp3 {use this with three help lines}
@d help4==@+begin help_ptr←4; hlp4 {use this with four help lines}
@d help5==@+begin help_ptr←5; hlp5 {use this with five help lines}
@d help6==@+begin help_ptr←6; hlp6 {use this with six help lines}
@<Glob...@>=
@!help_line:array[0..5] of str_number; {helps for the next |error|}
@!help_ptr:0..6; {the number of help lines present}
@!use_err_help:boolean; {should the |err_help| list be shown?}
@ @<Set init...@>=
help_ptr←0; use_err_help←false;
@ The |jump_out| procedure just cuts across all active procedure levels and
goes to |end_of_MF|. This is the only nonlocal |@!goto| statement in the
whole program. It is used when there is no recovery from a particular error.
Some \PASCAL\ compilers do not implement non-local |goto| statements.
@↑system dependencies@>
In such cases the body of |jump_out| should simply be
`|close_files_and_terminate|;' followed by a call on some system procedure
that quietly terminates the program.
@<Error hand...@>=
procedure jump_out;
begin goto end_of_MF;
end;
@ Here now is the general |error| routine.
@<Error hand...@>=
procedure error; {completes the job of error reporting}
label continue, exit;
var c:ASCII_code; {what the user types}
λλλ@!s1,@!s2,@!s3,@!s4:integer;
{used to save global variables when deleting tokens}
begin if history<error_message_issued then history←error_message_issued;
print_char("."); show_context;
if interaction=error_stop_mode then @<Get user's advice and |return|@>;
incr(error_count);
if error_count=100 then
begin print_nl("(That makes 100 errors; please try again.)");
@.That makes 100 errors...@>
history←fatal_error_stop; jump_out;
end;
@<Put help message on the transcript file@>;
exit:end;
@ @<Get user's advice...@>=
loop@+begin continue: clear_for_error_prompt; prompt_input("? ");
@.?@>
if last=first then return;
c←buffer[first];
if c≥"a" then c←c+"A"-"a"; {convert to uppercase}
@<Interpret code |c| and |return| if done@>;
end
@ It is desirable to provide an `\.E' option here that gives the user
an easy way to return from \MF\ to the system editor, with the offending
line ready to be edited. But such an extension requires some system
wizardry, so the present implementation simply types out what file should be
edited and the relevant line number.
@↑system dependencies@>
There is a secret `\.D' option available when the debugging routines have
not been commented out.
@↑debugging@>
@<Interpret code |c| and |return| if done@>=
case c of
"1","2","3","4","5","6","7","8","9": if deletions_allowed then
@<Delete |c-"0"| tokens, |goto continue|@>;
@t\4\4@>@;@+@!debug "D": begin debug_help; goto continue;@+end;@+gubed@/
"E": if base_ptr>0 then
begin print_nl("You want to edit file ");
@.You want to edit file x@>
λλλ print(input_stack[base_ptr].name_field);
print(" at line "); print_int(line);
interaction←scroll_mode; jump_out;
end;
"H": @<Print the help information, |goto continue|@>;
"I":@<Introduce new material from the terminal and |return|@>;
"Q","R","S":@<Change the interaction level and |return|@>;
"X":begin interaction←scroll_mode; jump_out;
end;
othercases do_nothing
endcases;@/
@<Print the menu of available options@>
@ @<Print the menu...@>=
print("Type <return> to proceed, S to scroll future error messages,");@/
print_nl("R to run without stopping, Q to run quietly,");@/
print_nl("I to insert something, ");
if base_ptr>0 then print("E to edit your file,");
if deletions_allowed then
print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,");
print_nl("H for help, X to quit.")
@ Here the author of \MF\ apologizes for making use of the numerical
relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings
|batch_mode|, |nonstop_mode|, |scroll_mode|.
@↑Knuth, Donald Ervin@>
@<Change the interaction...@>=
begin error_count←0; interaction←batch_mode+c-"Q";
print("OK, entering ");
case c of
"Q":begin print_esc("batchmode"); decr(selector);
end;
"R":print_esc("nonstopmode");
"S":print_esc("scrollmode");
end; {there are no other cases}
print("..."); print_ln; update_terminal; return;
end
@ When the following code is executed, |buffer[(first+1)..(last-1)]| may
contain the material inserted by the user; otherwise another prompt will
be given. In order to understand this part of the program fully, you need
to be familiar with \MF's input stacks.λλλ
@<Introduce new material...@>=
begin begin_file_reading; {enter a new syntactic level for terminal input}
{now |state=mid_line|, so an initial blank space will count as a blank}
if last>first+1 then
begin loc←first+1; buffer[first]←" ";
end
else begin prompt_input("insert>"); loc←first;
@.insert>@>
end;
first←last;
cur_input.limit_field←last-1; {no |end_line_char| ends this line}
return;
end
@ We allow deletion of up to 99 tokens at a time.
@<Delete |c-"0"| tokens...@>=
begin s1←cur_tok; s2←cur_cmd; s3←cur_chr; s4←align_state;
align_state←1000000; OK_to_interrupt←false;
if (last>first+1) and (buffer[first+1]≥"0")∧(buffer[first+1]≤"9") then
c←c*10+buffer[first+1]-"0"*11
else c←c-"0";
while c>0 do
begin get_token; {one-level recursive call of |error| is possible}
decr(c);
end;
cur_tok←s1; cur_cmd←s2; cur_chr←s3; align_state←s4; OK_to_interrupt←true;
help2("I have just deleted some text, as you asked.")@/
("You can now delete more, or insert, or whatever.");
show_context; goto continue;
end
@ @<Print the help info...@>=
begin if use_err_help then
begin give_err_help; use_err_help←false;
end
else begin if help_ptr=0 then
help2("Sorry, I don't know how to help in this situation.")@/
("Maybe you should try asking a human?");
repeat decr(help_ptr); print(help_line[help_ptr]); print_ln;
until help_ptr=0;
end;
help4("Sorry, I already gave what help I could...")@/
("Maybe you should try asking a human?")@/
("An error might have occurred before I noticed any problems.")@/
("``If all else fails, read the instructions.''");
goto continue;
end
@ @<Put help message on the transcript file@>=
if interaction>batch_mode then decr(selector); {avoid terminal output}
if use_err_help then
begin print_ln; give_err_help;
end
else while help_ptr>0 do
begin decr(help_ptr); print_nl(help_line[help_ptr]);
end;
print_ln;
if interaction>batch_mode then incr(selector); {enable terminal output}
print_ln
@ A dozen or so λλλ error messages end with a parenthesized integer, so we
save a teeny bit of program space by declaring the following procedure:
@p procedure int_error(@!n:integer);
begin print(" ("); print_int(n); print_char(")"); error;
end;
@ In anomalous cases, the print selector might be in an unknown state;
the following subroutine is called to fix things just enough to keep
running a bit longer.
@p procedure normalize_selector;
begin if job_name>0 then selector←term_and_log
else selector←term_only;
if interaction=batch_mode then decr(selector);
if job_name=0 then open_log_file;
end;
@ The following procedure prints \MF's last words before dying.
@d succumb==begin if interaction=error_stop_mode then
interaction←scroll_mode; {no more interaction}
error;
@!debug if interaction>batch_mode then debug_help;@+gubed@;@/
history←fatal_error_stop; jump_out; {irrecoverable error}
end
@<Error hand...@>=
procedure fatal_error(@!s:str_number); {prints |s|, and that's it}
begin normalize_selector;@/
print_err("Emergency stop"); help1(s); succumb;
@.Emergency stop@>
end;
@ Here is the most dreaded error message.
@<Error hand...@>=
procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness}
begin normalize_selector;
print_err("METAFONT capacity exceeded, sorry [");
@.METAFONT capacity exceeded ...@>
print(s); print_char("="); print_int(n); print_char("]");
help2("If you really absolutely need more capacity,")@/
("you can ask a wizard to enlarge me.");
succumb;
end;
@ The program might sometime run completely amok, at which point there is
no choice but to stop. If no previous error has been detected, that's bad
news; a message is printed that is really intended for the \MF\
maintenance person instead of the user (unless the user has been
particularly diabolical). The index entries for `this can't happen' may
help to pinpoint the problem.
@↑dry rot@>
@<Error hand...@>=
procedure confusion(@!s:str_number);
{consistency check violated; |s| tells where}
begin normalize_selector;
if history<error_message_issued then
begin print_err("This can't happen ("); print(s); print_char(")");
@.This can't happen@>
help1("I'm broken. Please show this to someone who can fix can fix");
end
else begin print_err("I can't go on meeting you like this");
@.I can't go on...@>
help2("One of your earlier faux pas has wounded me deeply,")@/
("so I'm barely conscious. Please fix it and try again.");
end;
succumb;
end;
@ Users occasionally want to interrupt \MF\ while it's running.
If the \PASCAL\ runtime system allows this, one can implement
a routine that sets the global variable |interrupt| to some nonzero value
when such an interrupt is signalled. Otherwise there is probably at least
a way to make |interrupt| nonzero using the \PASCAL\ debugger.
@↑system dependencies@>
@↑debugging@>
@d check_interrupt==begin if interrupt≠0 then pause_for_instructions;
end
@<Global...@>=
@!interrupt:integer; {should \MF\ pause for instruction?}
@!OK_to_interrupt:boolean; {should interrupts be observed?}
@ @<Set init...@>=
interrupt←0; OK_to_interrupt←true;
@ When an interrupt has been detected, the program goes into its
highest interaction level and lets the user have the full flexibility of
the |error| routine. \MF\ checks for interrupts only at times when it is
safe to do this.
@p procedure pause_for_instructions;
begin if OK_to_interrupt then
begin interaction←error_stop_mode;
if (selector=log_only)∨(selector=no_print) then
incr(selector);
print_err("Interruption");
@.Interruption@>
help3("You rang?")@/
("Try to insert some instructions for me (e.g.,`Iλλλshowlists'),")@/
("unless you just want to quit by typing `X'.");
deletions_allowed←false; error; deletions_allowed←true;
interrupt←0;
end;
end;
@* \[7] Arithmetic with scaled numbers.
The principal computations performed by \MF\ are done entirely in terms of
integers less than $2↑{31}$ in magnitude; thus, the arithmetic specified in this
program can be carried out in exactly the same way on a wide variety of
computers, including some small ones.
But \PASCAL\ does not define the @!|div|
operation in the case of negative dividends; for example, the result of
|(-2*n-1) div 2| is |-n-1| on some computers and |-n| on others.
There are two principal types of arithmetic: ``Translation-preserving,''
in which the identital |(a+q*b)div b=(a div b)+q| is valid; and
``negation-preserving,'' in which |(-a)div b=-(a div b)|. This leads to
two \MF s, which can produce different results, although the differences
should be negligible when the language is being used properly.
The \TeX\ processor has been defined carefully so that both varieties
of arithmetic will produce identical output, but it would be too
inefficient to constrain \MF\ in a similar way.
@d el_gordo == @'17777777777 {$2↑{31}-1$, the largest value \MF\ likes}
@ One of \MF's most common operations is the calculation of
$\lfloor{a+b}\over2\rfloor$,
the midpoint of two given integers |a| and~|b|. The only decent way to do
this in \PASCAL\ is to write `|(a+b) div 2|'; but on most machines it is
far more efficient to calculate `|(a+b)| right shifted one bit'.
Therefore the midpoint operation will always be denoted by `|half(a+b)|'
in this program. If \MF\ is being implemented with languages that permit
binary shifting, the |half| macro should be changed to make this operation
as efficient as possible.
@d half(#)==(#) div 2
@ A single computation might use several subroutine calls, and it is
desirable to avoid producing multiple error messages in case of arithmetic
overflow. So the routines below set the global variable |arith_error| to |true|
instead of reporting errors directly to the user.
@<Glob...@>=
@!arith_error:boolean; {has arithmetic overflow occurred recently?}
@ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples
of $2↑{-16}$. In other words, a binary point is assumed to be sixteen bit
positions from the right end of a binary computer word.
@d unity == @'200000 {$2↑{16}$, represents 1.00000}
@d two == @'400000 {$2↑{17}$, represents 2.00000}
@<Types...@>=
@!scaled = integer; {this type is used for scaled integers}
@!nonnegative_integer=0..el_gordo; {$0\L x<2↑{31}$}
@!small_number=0..63; {this type is self-explanatory}
@ The following function is used to create a scaled integer from a decimal
fraction $(.d_0d_1\ldots d_{k-1})$, where |0≤k≤16|. The digit $d_i$ is
given in |dig[i]|, and the calculation produces a correctly rounded result.
@p function round_decimals(@!k:small_number) : scaled;
{converts a decimal fraction}
var a:integer; {the accumulator}
begin a←0;
while k>0 do
begin decr(k); a←(a+dig[k]*two) div 10;
end;
round_decimals←half(a+1);
end;
@ Conversely, here is a procedure analogous to |print_int|. If the output
of this procedure is subsequently read by \MF\ and converted by the
|round_decimals| routine above, it turns out that the original value will
be reproduced exactly. [{\sl Proof:\/} If round$(x)=\lfloor
x+{1\over2}\rfloor$ and if $\alpha<1$, it is not difficult to verify that
round$(\alpha\,\hbox{round}( \alpha↑{-1}n))=n$ for all integers |n|. In
our case $\alpha=2↑{16}/10↑5$.]
@p procedure print_scaled(@!s:scaled); {prints scaled real, rounded to five
digits}
begin if s<0 then
begin print_char("-"); negate(s); {print the sign, if negative}
end;
print_int(s div unity); {print the integer part}
s←((s mod unity) * 3125 + 1024) div 2048;
{now |0≤s<100000| is the fraction part}
print_char(".");
repeat print_char("0"+(s div 10000)); s←10*(s mod 10000);
until s=0;
end;
@ The |scaled| quantities in \MF\ programs are generally supposed to be
less than $2↑{12}$ in absolute value, so \MF\ does much of its internal
arithmetic with 28~significant bits of precision. A |fraction| denotes
a scaled integer whose binary point is assumed to be 28 bit positions
from the right.
@d fraction_half==@'1000000000 {$2↑{27}$, represents 0.50000000}
@d fraction_one==@'2000000000 {$2↑{28}$, represents 1.00000000}
@d fraction_two==@'4000000000 {$2↑{29}$, represents 2.00000000}
@d fraction_three==@'6000000000 {$3\cdot2↑{28}$, represents 3.00000000}
@d fraction_four==@'4000000000 {$2↑{30}$, represents 4.00000000}
@<Types...@>=
@!fraction=integer; {this type is used for scaled fractions}
@ In fact, the two sorts of scaling aren't quite sufficient; \MF\ has yet
another, used internally to keep track of angles in units of $2↑{-20}$
degrees.
@d forty_five_deg==@'264000000 {$45\cdot2↑{20}, represents $45↑\circ$}
@d ninety_deg==@'550000000 {$90\cdot2↑{20}, represents $90↑\circ$}
@d one_eighty_deg==@'1320000000 {$180\cdot2↑{20}, represents $180↑\circ$}
@d three_sixty_deg==@'2640000000 {$360\cdot2↑{20}, represents $360↑\circ$}
@<Types...@>=
@!angle=integer; {this type is used for scaled angles}
@ The |make_fraction| routine produces the |fraction| equivalent of
|p/q|, given integers |p| and~|q|; it computes the integer
$f=\lfloor2↑{28}p/q+{1\over2}\rfloor$, when $p$ and $q$ are
positive. If |p| and |q| are both of
the same scaled type |t|, we have |make_fraction(t,t)=fraction|;
and it's also possible to use the subroutine ``backwards,'' using
the relation |make_fraction(t,fraction)=t| between scaled types.
If the result would have magnitude $2↑{31}$ or more, |make_fraction|
sets |arith_error←true|. Most of \MF's internal computations have
been designed to avoid this sort of error.
If this subroutine were programmed in assembly language on a typical
machine, we could simply compute |(@t$2↑{28$@>*p)div q|, since a
double-precision product can often be input to a fixed-point division
instruction. But when we are restricted to \PASCAL\ arithmetic it
is necessary either to resort to multiple-precision maneuvering
or to use a simple but slow iteration. The multiple-precision technique
would be about three times faster than the code adopted here, but it
would be comparatively long and tricky, involving about sixteen
additional multiplications and divisions.
This operation is not part of \MF's ``inner loop,'' so it has been
implemented in a simple but slow way that avoids multiplication
and division except in the initial stage. Many of the routines that
follow could, similarly, be made to run about three times faster
at the cost of longer and trickier programs; but \MF\ as a whole
would not be speeded up by very much, so the author has opted for
simplicity and robustness.
@p function make_fraction(@!p,@!q:integer):fraction;
var @!f:integer; {the fraction bits, with a leading 1 bit}
@!n:integer; {the integer part of $\vert p/q\vert$}
@!negative:boolean; {should the result be negated?}
begin if p≥0 then negative←false
else begin negate(p); negative←true;
end;
if q≤0 then
begin if q=0 then confusion("fraction");
@:this can't happen fraction}{\quad fraction@>
negate(q); negative←¬ negative;
end;
n←p div q; p←p mod q;
if n<8 then n←(n-1)*fraction_one
else begin arith_error←true; n←@'14000000000; {6.00000000}
end;
@<Compute $f=\lfloor 2↑{28}(1+p/q)+{1\over2}\rfloor$@>;
if negative then make_fraction←-(f+n)
else make_fraction←f+n;
end;
@ The |repeat| loop here preserves the following invariant relations:
(i)~|0≤p<q|; (ii)~$fq+p=2↑k(q+p↓0)$, where $k$ is an integer and
$p↓0$ is the original value of~$p$.
Notice that the computation specifies
|(p-q)+p| instead of |(p+p)-q|, because the latter could overflow.
Let us hope that optimizing compilers do not miss this point.
@<Compute $f=\lfloor 2↑{28}(1+p/q)+{1\over2}\rfloor$@>=
f←1;
repeat p←(p-q)+p;
if p≥0 then f←f+f+1
else begin f←f+f; p←p+q;
end;
until f≥fraction_one;
if (p-q)+p≥0 then incr(f)
@ The dual of |make_fraction| is |take_fraction|, which multiplies a
given integer~|q| by a fraction~|f|. When the operands are positive, it
computes $p=\lfloor qf/2↑{28}+{1\over2}\rfloor$, a symmetric function
of |q| and~|f|.
@p function take_fraction(@!q:integer;@!f:fraction):integer;
var @!p:integer; {the fraction so far}
@!negative:boolean; {should the result be negated?}
@!n:integer; {additional multiple of $q$}
begin if f≥0 then negative←false
else begin negate(f); negative←true;
end;
n←f div fraction_one;
if q≤el_gordo div n then n←n*q
else begin arith_error←true; n←el_gordo-q;
end;
f←(f mod fraction_one)+fraction_one;
@<Compute $p=\lfloor qf/2↑{28}+{1\over2}\rfloor-q$@>;
if (n-el_gordo)+p>0 then
begin arith_error←true; n←el_gordo-p;
end;
if negative then take_fraction←-(n+p)
else take_fraction←n+p;
end;
@ The invariant relations in this case are (i)~$\lfloor(qf+p)/2↑k\rfloor
=\lfloor qf↓0/2↑{28}+{1\over2}\rfloor+q$, where $k$ is an integer and
$f↓0$ is the original value of~$f$; (ii)~$2↑k\L f<2↑{k+1}$.
@<Compute $p=\lfloor qf/2↑{28}+{1\over2}\rfloor-q$@>=
p←@'1000000000; {$2↑{27}$; the invariants hold now with $k=28$}
repeat if odd(f) then p←half(p+q)@+else p←half(p);
f←half(f);
until f=1
@ Here is an example of a typical use of the |fraction|-oriented routines.
It computes the function
$${1\over3\tau}f(\theta,\phi)=
{\tau↑{-1}\bigl(2+\sqrt2(\sin\theta-{1\over16}\sin\phi)
(\sin\phi-{1\over16}\sin\theta)(cos\theta-\cos\phi)\bigr)\over
3\,\bigl(1+{1\over2}(\sqrt5-1)\cos\theta+{1\over2}(3-\sqrt5\,)\cos\phi\bigr)},$$
where $\tau$ is a |scaled| ``tension'' parameter. This is \MF's magic
fudge factor for placing the first control point of a curve that starts
at an angle $\theta$ and ends at an angle $\phi$ from the straight path.
(Actually, if the stated quantity exceeds 4, \MF\ reduces it to~4.)
The tension is at least $3\over4$, hence the numerator of the expression above
is less than ${4\over3}(2+\sqrt2\cdot{17\over16}\cdot{17\over16}\cdot2)<7$;
and the denominator is at most~6. Hence the fixed-point calculations below
are guaranteed to stay within the bounds of a 32-bit computer word.
The angles $\theta$ and $\phi$ are given implicity in terms of |fraction|
arguments |st|, |ct|, |sf|, and |cf|, representing $\sin\theta$, $\cos\theta$,
$\sin\phi$, and $\cos\phi$, respectively.
@p function velocity(@!st,@!ct,@!sf,@!cf:fraction;@!tension:scaled):fraction;
var @!acc,@!num,@!denom:integer; {registers for intermediate calculations}
begin acc←take_fraction(st-(sf div 16), sf-(st div 16));
acc←take_fraction(acc,ct-cf);
num←fraction_two+take_fraction(acc,379625062); {that's $2↑{28}\sqrt2$}
denom←fraction_three+take_fraction(ct,497706707)+take_fraction(cf,307599661);
{the constants are $3\cdot2↑{28}$ times
$.618\ldots$ and $1-.618\ldots=(.618\ldots\,)↑2$}
if tension≠unity then
if tension≥@'2000000 then num←take_fraction(num div @'10000,tension)
else num←take_fractioon(num,tension*@'10000);
if num div 4≥denom then velocity←fraction_four
else velocity←make_fraction(num,denom);
end;
@* \[8] Algebraic and transcendental functions.
\MF\ computes all special functions from scratch, without relying on
|real| arithmetic or system subroutines for sines, cosines, etc.
@ To get the square root of a |scaled| number |x|, we want to calculate
$s=\lfloor 2↑8\sqrt x +{1\over2}\rfloor$. If $x>0$, this is the unique
integer such that $2↑{16}x-s\L s↑2<2↑{16}x+s$. The following subroutine
determines $s$ by an iterative method that maintains the invariant
relations $x=2{46-2k}x↓0\bmod 2↑{30}$, $0<y=\lfloor 2↑{16-2k}x↓0\rfloor
-s↑2+s\L q=2s$. The value of~$y$ might, however, be zero at the start
of the first iteration.
@p function square_rt(@!x:scaled):scaled;
var @!k:small_number; {iteration control counter}
@!y,@!q:integer; {registers for intermediate calculations}
begin if x≤0 then @<Handle square root of zero or negative argument@>
else begin k←23; q←2;
while x<fraction_two do {i.e., |while x<@t$2↑{29}$@>|\unskip}
begin decr(k); x←x+x+x+x;
end;
if x<fraction_four then y←0
else begin x←x-fraction_four; y←1;
end;
repeat @<Decrease |k| by 1, maintaining the invariant
relations between |x|, |y|, and~|q|@>;
until k=0;
square_rt←half(q);
end;
end;
@ @<Handle square root of zero...@>=
begin if x<0 then
begin print_err("Square root of ");
print_scaled(x); print(" has been replaced by 0");
help2("Since I don't take square roots of negative numbers,")@/
("I'm zeroing this one. Proceed, with fingers crossed.");
error;
end;
square_rt←0;
end
@ @<Decrease |k| by 1, maintaining...@>=
x←x+x; y←y+y;
if x≥fraction_four then {note that |fraction_four=@t$2↑{30}@>|}
begin x←x-fraction_four; incr(y);
end;
x←x+x; y←y+y-q; q←q+q;
if x≥fraction_four then
begin x←x-fraction_four; incr(y);
end;
if y>q then
begin y←y-q; q←q+2;
end
else if y≤0 then
begin q←q-2; y←y-q;
end;
decr(k)
@ Pythagorean addition $\sqrt{a↑2+b↑2}$ is implemented by an elegant
iterative scheme due to Cleve Moler and Donald Morrison [{\sl IBM Journal
@↑Moler, Cleve@>
@↑Morrison, Donald Ross@>
of Research and Development\/ \bf27} (1983), 577--581]. It modifies |a| and~|b|
in such a way that their pythagorean sum remains invariant, while the
smaller argument decreases.
@p function pyth_add(@!a,@!b:integer):integer;
label done;
var @!r:fraction; {register used to transform |a| and |b|}
@!big:boolean; {is the result dangerously near $2↑{31}$?}
begin a←abs(a); b←abs(b);
if a<b then
begin r←b; b←a; a←r;
end; {now |0≤b≤a|}
if a>0 then
begin if a<fraction_two then big←false
else begin a←(a+2) div 4; b←(b+2) div 4; big←true;
end; {we reduced the precision to avoid arithmetic overflow}
@<Replace |a| by an approximation to $\sqrt{a↑2+b↑2}$@>;
if big then
if a<fraction_four then a←a+a+a+a
else begin arith_error←true; a←el_gordo;
end;
end;
pyth_add←a;
end;
@ The key idea here is to reflect the vector $(a,b)$ about the
line through $(a,b/2)$.
@<Replace |a| by an approx...@>=
loop@+ begin r←make_fraction(b,a+a);
r←take_fraction(r,r); {now $r\approx b↑2/4a↑2$}
if r=0 then goto done;
a←a+take_fraction(a+a,r); b←take_fraction(b,r);
end;
done:
@ The subroutines for logarithm and exponential involve two tables.
The first is simple: |two_to_the[k]| equals $2↑k$. The second involves
a bit more calculation, which the author claims to have done correctly:
|spec_log[k]| is $2↑{27}$ times $\ln\bigl(1/(1-2↑{-k})\bigr)=
2↑{-k}+{1\over2}2↑{-2k}+{1\over3}2↑{-3k}+\cdots\,$, rounded to the
nearest integer.
@<Glob...@>=
@!two_to_the:array[0..30] of integer; {powers of two}
@!spec_log:array[1..28] of integer; {special logarithms}
@ @<Local variables for initialization@>=
@!k:small_number; {for short loops}
@ @<Set init...@>=
two_to_the[0]←1;
for k←1 to 28 do two_to_the[k]←2*two_to_the[k-1];
spec_log[1]←93032640;
spec_log[2]←38612034;
spec_log[3]←17922280;
spec_log[4]←8662214;
spec_log[5]←4261238;
spec_log[6]←2113709;
spec_log[7]←1052693;
spec_log[8]←525315;
spec_log[9]←262400;
spec_log[10]←131136;
spec_log[11]←65552;
spec_log[12]←32772;
spec_log[13]←16385;
for k←14 to 27 do spec_log[k]←two_to_the[27-k];
spec_log[28]←1;
@ Here is the routine that calculates $2↑8$ times the natural logarithm
of a |scaled| quantity; it is an integer approxite to $2↑{24}\ln(x/2↑{16})$,
when |x| is a given positive integer.
The method is based on exercise 1.2.2--25 in {\sl The Art of Computer
Programming\/}: During the main iteration we have $1\L 2↑{-30}x<1/(1-2↑{1-k})$,
and the logarithm of $2↑{30}x$ remains to be added to an accumulator
register called~$y$. Three auxiliary bits of accuracy are retained in~$y$
during the calculation.
@p function m_log(@!x:scaled):scaled;
var @!y,@!z:integer; auxiliary registers}
begin if x≤0 then @<Handle non-positive logarithm@>
else begin y←1302456956+4; {$14\times2↑{27}\ln2\approx1302456956.421063}
z←27595+32768; {and $2↑{16}\times .421063\approx 27595}
while x<fraction_four do
begin x←x+x; y←y-93032640; z←z+48782;
end; {$2↑{27}\ln2\approx 93032639.74436163$
and $2↑{16}\times.74436163\approx 48782$}
y←y+(z div unity); k←2;
while x>fraction_four+4 do
@<Increase |k| until |x| can be multiplied by a
factor of $2↑{-k}$, and adjust $y$ accordingly@>;
m_log←y div 8;
end;
end;
@ @<Increase |k| until |x| can...@>=
begin z←((x-1) div two_to_the[k])+1; {$z=\lceil x/2↑k\rceil$}
while x<fraction_four+z do
begin z←half(z+1); k←k+1;
end;
y←y+spec_log[k]; x←x-z;
end
@ @<Handle non-positive logarithm@>=
begin print_err("Logarithm of ");
print_scaled(x); print(" has been replaced by 0");
help2("Since I don't take logs of non-positive numbers,")@/
("I'm zeroing this one. Proceed, with fingers crossed.");
error; m_log←0;
end
@ Conversely, the exponential routine calculates $\exp(x/2↑8)$,
when |x| is |scaled|. The result is an integer approximation to
$2↑{16}\exp(x/2↑{24})$, when |x| is regarded as an integer.
@p function m_exp(@!x:scaled):scaled;
var @!k:small_number; {loop control index}
@!y,@!z:integer; {auxiliary registers}
begin if x>174436200 then
{$2↑{24}\ln((2↑{31}-1)/2↑{16})\approx 174436199.51$}
begin arith_error←true; m_exp←el_gordo;
end
else if x<-197694359 then m_exp←0
{$2↑{24}\ln(2↑{-1}/2↑{16})\approx-197694359.45$}
else begin if x≤0 then
begin z←-8*x; y←@'4000000; {$y=2↑{20}$}
end
else begin if x≤127919879 then z←1023359037-8*x
{$2↑{27}\ln((2↑{31}-1)/2↑{20})\approx 1023359037.125$}
else z←8*(174436200-x); {|z| is always nonnegative}
y←el_gordo;
end;
@<Multiply |y| by $\exp(-z/2↑{27})$@>;
if x≤127919879 then m_exp←(y+8) div 16@+else m_exp←y;
end;
end;
@ The idea here is that subtracting |spec_log[k]| from |z| corresponds
to multiplying |y| by $1-2↑{-k}$.
A subtle point (that had to be checked) was that if $x=127919879$, the
value of~|y| will decrease so that |y+8| doesn't overflow. In fact,
$z$ will be 5 in this case, and |y| will decrease by~64 when |k=25|
and by~16 when |k=27|.
@<Multiply |y| by...@>=
k←1;
while z>0 do
begin while z≥spec_log[k] do
begin z←z-spec_log[k];
y←y-1-((y-two_to_the[k-1]) div two_to_the[k]);
end;
incr(k);
end
@ The trigonometric subroutines use an auxiliary table such that
|spec_atan[k]| contains an approximation to the |angle| whose tangent
is~$1/2↑k$.
@<Glob...@>=
@!spec_atan:array[1..26] of angle; {$\arctan2↑{-k}$ times $2↑{20}\cdot180/\pi$}
@ @<Set init...@>=
spec_atan[1]←27855475;
spec_atan[2]←14718068;
spec_atan[3]←7471121;
spec_atan[4]←3750058;
spec_atan[5]←1876857;
spec_atan[6]←938658;
spec_atan[7]←469357;
spec_atan[8]←234682;
spec_atan[9]←117342;
spec_atan[10]←58671;
spec_atan[11]←29335;
spec_atan[12]←14668;
spec_atan[13]←7334;
spec_atan[14]←3667;
spec_atan[15]←1833;
spec_atan[16]←917;
spec_atan[17]←458;
spec_atan[18]←229;
spec_atan[19]←115;
spec_atan[20]←57;
spec_atan[21]←29;
spec_atan[22]←14;
spec_atan[23]←7;
spec_atan[24]←4;
spec_atan[25]←2;
spec_atan[26]←1;
@ Given integers |x| and |y|, not both zero, the |m_atan| function
returns the |angle| whose tangent points in the direction $(x,y)$.
This subroutine first determines the correct octant, then solves the
problem for |0≤y≤x|, then converts the result appropriately to
return an answer in the range |-one_eighty_deg≤@t$\theta$@>≤one_eighty_deg|.
(The answer is |+one_eighty_deg| if |y=0| and |x<0|, but an answer of
|-one_eighty_deg| is possible if, for example, |y=-1| and $x=2↑{30}$.)
The octants are represented in a ``Gray code,'' since that turns out
to be computationally simplest.
@d switch_x_and_y=1
@d negate_x=2
@d negate_y=4
@d first_octant=0
@d second_octant=switch_x_and_y
@d third_octant=switch_x_and_y+negate_x
@d fourth_octant=negate_x
@d fifth_octant=negate_x+negate_y
@d sixth_octant=switch_x_and_y+negate_x+negate_y
@d seventh_octant=switch_x_and_y+negate_y
@d eighth_octant=negate_y
@p function n_atan(@!x,@!y:integer):integer;
var @!z:angle; {auxiliary register}
@!t:integer; {temporary storage}
@!k:small_number; {loop counter}
begin if x≥0 then octant←0
else begin negate(x); octant←negate_x;
end;
if y<0 then
begin negate(y); octant←octant+negate_y;
end;
if x<y then
begin t←y; y←x; x←t; octant←octant+switch_x_and_y;
end;
if x=0 then @<Handle undefined arctangent@>
else begin @<Set |z| to the arctangent determined by $(x,y)$@>;
@<Return an appropriate answer based on |z| and |octant|@>;
end;
end;
@ @<Handle undefined arctangent@>=
begin print_err("Arctangent of (0,0) is taken as zero");
help2("The `angle' between two identical points is undefined.")@/
("I'm zeroing this one. Proceed, with fingers crossed.");
error; n_atan←0;
end
@ @<Return an appropriate answer...@>=
case octant of
first_octant:n_atan←z;
second_octant:n_atan←ninety_deg-z;
third_octant:n_atan←ninety_deg+z;
fourth_octant:n_atan←one_eighty_deg-z;
fifth_octant:n_atan←z-one_eighty_deg;
sixth_octant:n_atan←-z-ninety_deg;
seventh_octant:n_atan←z-ninety_deg;
eighth_octant:n_atan←-z;
end {there are no other cases}
@ At this point we have |x≥y≥0|, and |x>0|. The numbers are scaled up
or down until $2↑{28}\L x<2↑{29}$, so that accurate fixed-point calculations
will be made.
@<Set |z| to the arctangent...@>=
while x≥fraction_two do
begin x←half(x); y←half(y);
end;
z←0;
if y>0 then
begin while x<fraction_one do
begin x←x+x; y←y+y;
end;
@<Increase |z| to the arctangent determined by $(x,y)$@>;
end
@ During the calculations of this section, variables |x| and~|y|
represent actual coordinates $(x,2↑{-k}y)$. We will maintain the
condition |x≥y|, so that the tangent will be at most $2↑{-k}$.
If $x<2y$, the tangent is greater than $2↑{-k-1}$. The transformation
$(a,b)\mapsto(a+b\tan\phi,b-a\tan\phi)$ replaces $(a,b)$ by
coordinates whose angle has decreased by~$\phi$; in the special case
$a=x$, $b=2↑{-k}y$, and $\tan\phi=2↑{-k-1}$, this operation reduces
to the particularly simple iteration shown here. [Cf.~John E. Meggitt,
@↑Meggitt, John E.@>
{\sl IBM Journal of Research and Development\/ \bf6} (1962), 210--226.]
The initial value of |x| will be multiplied by at most
$(1+{1\over2})(1+{1\over8})(1+{1\over32})\cdots\approx 1.7584$; hence
there is no chance of integer overflow.
@<Increase |z|...@>=
k←0;
repeat y←y+y; incr(k);
if y>x then
begin z←z+spec_atan[k]; t←x; x←x+(y div two_to_the[k+k]; y←y-t;
end;
until k=15;
repeat y←y+y; incr(k);
if y>x then
begin z←z+spec_atan[k]; y←y-x;
end;
until k=26
@ Conversely, the |n_sin_cos| routine takes an |angle| and produces the sine
and cosine of that angle. The results of this routine are
stored in global integer variables |n_sin| and |n_cos|.
@<Glob...@>=
@!n_sin,n_cos:integer; {results computed by |n_sin_cos|}
@ Given an integer |z| that is $2↑{20}$ times an angle $\theta$ in degrees,
the purpose of |n_sin_cos(z)| is to set
|x=@t$r\cos\theta$@>| and |y=@t$r\sin\theta$@>| (approximately),
for some rather large number~|r|. The maximum of |x| and |y|
will be between $2↑{28}$ and $2↑{30}$, so that there will be hardly
any loss of accuracy. Then |x| and~|y| are divided by~|r|.
@p procedure n_sin_cos(@!z:angle); {computes a multiple of the sine and cosine}
var @!k:small_number; {loop control variable}
@!q:0..7; {specifies the quadrant}
@!r:fraction; {magnitude of |(x,y)|}
@!t:integer; {temporary register}
begin while z<0 do z←z+three_sixty_deg;
z←z mod three_sixty_deg; {now |0≤z<three_sixty_deg|}
q←z div forty_five_deg; z←z mod forty_five_deg;
x←fraction_one; y←x;
if not odd(q) then z←forty_five_deg-z;
@<Subtract angle |z| from |(x,y)|@>;
@<Convert |(x,y)| to the octant determined by~|q|@>;
r←pyth_add(x,y); n_cos←make_fraction(x,r); n_sin←make_fraction(y,r);
end;
@ In this case the octants are numbered sequentially.
@<Convert |(x,...@>=
case q of
0:do_nothing;
1:begin t←x; x←y; y←t;
end;
2:begin t←x; x←-y; y←t;
end;
3:negate(x);
4:begin negate(x); negate(y);
end;
5:begin t←x; x←-y; y←-t;
end;
6:begin t←x; x←y; y←-t;
end;
7:negate(y);
end {there are no other cases}
@ The main iteration of |n_sin_cos| is similar to that of |n_atan| but
applied in reverse. The values of |spec_atan[k]| decrease slowly enough
that this loop is guaranteed to terminate before the (nonexistent) value
|spec_atan[27]| would be required.
@<Subtract angle |z|...@>=
k←1;
while z>0 do
begin if z≥spec_atan[k] then
begin z←z-spec_atan[k]; t←x;@/
x←t+y div two_to_the[k];
y←y-t div two_to_the[k];
end;
incr(k);
end;
if y<0 then y←0 {this precaution may never be needed}
@ λλλ Random number generators: still to come λλλ
@* \[9] Packed data.
In order to make efficient use of storage space, \MF\ bases its major data
structures on a |memory_word|, which contains either a (signed) integer,
possibly scaled, or a small number of fields that are one half or one
quarter of the size used for storing integers.
If |x| is a variable of type |memory_word|, it contains up to four
fields that can be referred to as follows:
$$\vbox{\halign{\hfil#\hfil\hfil\cr
|x|&.|int|&(an |integer|)\cr
|x|&.|sc|\qquad&(a |scaled| integer)\cr
|x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr
|x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword
field)\cr
|x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt
&\qquad\qquad\qquad(four quarterword fields)\cr}}$$
This is somewhat cumbersome to write, and not very readable either, but
macros will be used to make the notation shorter and more transparent.
The \PASCAL\ code below gives a formal definition of |memory_word| and
its subsidiary types, using packed variant records. \MF\ makes no
assumptions about the relative positions of the fields within a word.
Since we are assuming 32-bit integers, a halfword must contain at least
16 bits, and a quarterword must contain at least 8 bits.
@↑system dependencies@>
But it doesn't hurt to have more bits; for example, with enough 36-bit
words you might be able to have |mem_max| as large as 262142.
N.B.: Valuable memory space will be dreadfully wasted unless \MF\ is compiled
by a \PASCAL\ that packs all of the |memory_word| variants into
the space of a single integer. Some \PASCAL\ compilers will pack an
integer whose subrange is `|0..255|' into an eight-bit field, but others
insist on allocating space for an additional sign bit; on such systems you
can get 256 values into a quarterword only if the subrange is `|-128..127|'.
The present implementation tries to accommodate as many variations as possible,
so it makes rather general assumptions. If integers having the subrange
`|min_quarterword..max_quarterword|' can be packed into a quarterword,
and if integers having the subrange `|min_halfword..max_halfword|'
can be packed into a halfword, everything should work satisfactorily.
It is usually most efficient to have |min_quarterword=min_halfword=0|,
so one should try to achieve this unless it causes a severe problem.
The values defined here are recommended for most 32-bit computers.
@d min_quarterword=0 {smallest allowable value in a |quarterword|}
@d max_quarterword=255 {largest allowable value in a |quarterword|}
@d min_halfword==0 {smallest allowable value in a |halfword|}
@d max_halfword==65535 {largest allowable value in a |halfword|}
@ Here are the inequalities that the quarterword and halfword values
must satisfy (or rather, the inequalities that they mustn't satisfy):
@<Check the ``constant''...@>=
if (min_quarterword>0)∨(max_quarterword<127) then bad←11;
if (min_halfword>0)∨(max_halfword<32767) then bad←12;
if (min_quarterword<min_halfword)∨@|
(max_quarterword>max_halfword) then bad←13;
if (mem_base<min_halfword)∨(mem_max≥max_halfword) then bad←14;
if (save_size>max_halfword)∨(max_strings>max_halfword) then bad←15;
if (buf_size>max_halfword) then bad←16;
@ The operation of adding or subtracting |min_halfword| occurs quite
frequently in \MF, so it is convenient to abbreviate this operation
by using the macros |hi| and |ho| for input and output to and from
halfword format. \MF\ will run faster with respect to compilers
that don't optimize expressions like `|x+0|' and `|x-0|', if these
macros are simplified in the obvious way when |min_halfword=0|.
@↑system dependencies@>
@d hi(#)==#+min_halfword
{to put a sixteen-bit item into a halfword}
@d ho(#)==#-min_halfword
{to take a sixteen-bit item from a halfword}
@ The reader should study the following definitions closely:
@↑system dependencies@>
@d sc==int {|scaled| data is equivalent to |integer|}
@<Types...@>=
@!quarterword = min_quarterword..max_quarterword; {1/4 of a word}
@!halfword=min_halfword..max_halfword; {1/2 of a word}
@!two_choices = 1..2; {used when there are two variants in a record}
@!three_choices = 1..3; {used when there are three variants in a record}
@!two_halves = packed record@;@/
@!rh:halfword;
case two_choices of
1: (@!lh:halfword);
2: (@!b0:quarterword; @!b1:quarterword);
end;
@!four_quarters = packed record@;@/
@!b0:quarterword;
@!b1:quarterword;
@!b2:quarterword;
@!b3:quarterword;
end;
@!memory_word = record@;@/
case three_choices of
1: (@!int:integer);
2: (@!hh:two_halves);
3: (@!qqqq:four_quarters);
end;
@!word_file = file of memory_word;
@ When debugging, we may want to print a |memory_word| without knowing
what type it is; so we print it in all modes.
@↑dirty \PASCAL@>@↑debugging@>
@p @!debug procedure print_word(@!w:memory_word);
{prints |w| in all ways}
begin print_int(w.int); print_char(" ");@/
print_scaled(w.sc); print_char(" "); print_ln;@/
print_int(w.hh.lh); print_char("="); print_int(w.hh.b0); print_char(":");
print_int(w.hh.b1); print_char(";"); print_int(w.hh.rh); print_char(" ");@/
print_int(w.qqqq.b0); print_char(":"); print_int(w.qqqq.b1); print_char(":");
print_int(w.qqqq.b2); print_char(":"); print_int(w.qqqq.b3);
end;
gubed
@* \[10] Dynamic memory allocation.
The \MF\ system does nearly all of its own memory allocation, so that it
can readily be transported into environments that do not have automatic
facilities for strings, garbage collection, etc., and so that it can be in
control of what error messages the user receives. The dynamic storage
requirements of \MF\ are handled by providing a large array |mem| in
which consecutive blocks of words are used as nodes by the \MF\ routines.
Pointer variables are indices into this array, or into another array
called |eqtb| that will be explained below. A pointer variable might
also be a special flag that lies outside the bounds of |mem|, so we
allow pointers to assume any |halfword| value. The minimum halfword
value represents a null pointer.
@d pointer==halfword {a flag or a location in |mem| or |eqtb|}
@d null==mem_base {the null pointer}
@<Glob...@>=
@!temp_ptr:pointer; {a pointer variable for occasional emergency use}
@ The |mem| array is divided once and for all into two regions that are
allocated separately. Locations less than |hi_mem_base| are used for storing
variable-length records consisting of two or more words each. This region
is maintained using an algorithm similar to the one described in exercise
2.5--19 of {\sl The Art of Computer Programming}. However, no size field
appears in the allocated nodes: the program is responsible for knowing the
relevant size when a node is freed. The remaining region of |mem| is
allocated in single words using a conventional \.{AVAIL} stack.
Incidentally, it would be feasible to construct implementations of \MF\ that
are based on 16-bit words instead of 32-bit words, for machines having
comparatively small memories. In such cases it would be desirable to have
two parallel arrays for the upper part of memory, called say \\{mem\_link}|[p]|
and \\{mem\_info}|[p]|,
since the single-word region in the present implementation
consists entirely of |memory_word| items of type |two_halves|.
@↑small computers@>
@<Glob...@>=
@!mem : array[mem_base..mem_max] of memory_word; {the big dynamic storage area}
@ In order to study the memory requirements of particular applications, it
is possible to prepare a version of \MF\ that keeps track of current and
maximum memory usage. When code between the delimiters |@!stat| $\ldots$
|tats| is not ``commented out,'' \MF\ will run a bit slower but it will
report these statistics when |tracing_stats| is positive.
@<Glob...@>=
@!var_used, @!dyn_used : integer; {how much memory is in use}
@!max_var_used : integer; {how much memory was in use}
@ Let's consider the one-word memory region first, since it's the
simplest. The pointer variable |mem_end| holds the highest-numbered location
of |mem| that has ever been used. The free locations of |mem| that
occur between |hi_mem_base| and |mem_end|, inclusive, are of type
|two_halves|, and we write |info(p)| and |link(p)| for the |lh|
and |rh| fields of |mem[p]| when it is of this type. The single-word
free locations form a linked list
$$|avail|,\;\hbox{|link(avail)|},\;\hbox{|link(link(avail))|},\;\ldots$$
terminated by |null|.
@d link(#) == mem[#].hh.rh {the |link| field of a memory word}
@d info(#) == mem[#].hh.lh {the |info| field of a memory word}
@<Glob...@>=
@!avail : pointer; {head of the list of available one-word nodes}
@!mem_end : pointer; {the last one-word node used in |mem|}
@ If one-word memory is exhausted, it might mean that the user has forgotten
a right brace. We will define some procedures later that try to help
pinpoint the trouble.
@p @<Declare the procedure called |show_token_list|@>@/
@<Declare the procedure called |runaway|@>
@ The function |get_avail| returns a pointer to a new one-word node whose
|link| field is null. However, \MF\ will halt if there is no more room left.
@p function get_avail : pointer; {single-word node allocation}
var p:pointer; {the new node being got}
begin p←avail; {get top location in the |avail| stack}
if p≠null then avail←link(avail) {and pop it off}
else if mem_end<mem_max then {or go into virgin territory}
begin incr(mem_end); p←mem_end;
end
else begin runaway; {if memory is exhausted, display possible runaway text}
overflow("macro memory size",mem_max+1-hi_mem_base);
{quit; all one-word nodes are busy}
@:METAFONT capacity exceeded macro memory size}{\quad macro memory size@>
end;
link(p)←null; {provide an oft-desired initialization of the new node}
@!stat incr(dyn_used);@+tats@;{maintain statistics}
get_avail←p;
end;
@ Conversely, a one-word node is recycled by calling |free_avail|.
@d free_avail(#)== {single-word node liberation}
begin link(#)←avail; avail←#;
@!stat decr(dyn_used);@+tats@/
end
@ The procedure |flush_list(p)| frees an entire linked list of
one-word nodes that starts at position |p|.
@p procedure flush_list(@!p:pointer); {makes list of single-word nodes
available}
var q:pointer; {the successor of node |p|}
begin while p≠null do
begin q←link(p); free_avail(p); p←q;
end;
end;
@ The available-space list that keeps track of the variable-size portion
of |mem| is a nonempty, doubly-linked circular list of empty nodes,
pointed to by the roving pointer |rover|.
Each empty node has size 2 or more; the first word contains the special
value |max_halfword| in its |link| field and the size in its |info| field;
the second word contains the two pointers for double linking.
Each nonempty node also has size 2 or more. Its first word is of type
|two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|.
Otherwise there is complete flexibility with respect to the contents
of its other fields and its other words.
@d empty_flag == max_halfword {the |link| of an empty variable-size node}
@d is_empty(#) == (link(#)=empty_flag) {tests for empty node}
@d node_size == info {the size field in empty variable-size nodes}
@d llink(#) == info(#+1) {left link in doubly-linked list of empty nodes}
@d rlink(#) == link(#+1) {right link in doubly-linked list of empty nodes}
@<Glob...@>=
@!rover : pointer; {points to some node in the list of empties}
@ A call to |get_node| with argument |s| returns a pointer to a new node
of size~|s|, which must be 2~or more. The |link| field of the first word
of this new node is set to null. An overflow stop occurs if no suitable
space exists.
If |get_node| is called with $s=2↑{30}$, it simply merges adjacent free
areas and returns the value |max_halfword|.
@p function get_node(@!s:integer):pointer; {variable-size node liberation}
label found,exit;
var p:pointer; {the node currently under inspection}
@!q:pointer; {the node physically after node |p|}
@!r:integer; {the newly allocated node, or a candidate for this honor}
begin p←rover; {start at some free node in the ring}
repeat @<Try to allocate within node |p| and its physical successors,
and |goto found| if allocation was possible@>;
p←rlink(p); {move to the next node in the ring}
until p=rover; {repeat until the whole list has been traversed}
if s=@'10000000000 then
begin get_node←max_halfword; return;
end;
overflow("box memory size",hi_mem_base-mem_base);
{sorry, nothing satisfactory is left}
@:METAFONT capacity exceeded box memory size}{\quad box memory size@>
found: link(r)←null; {this node is now nonempty}
@!stat var_used←var_used+s; {maintain usage statistics}
if var_used>max_var_used then max_var_used←var_used;
tats@;@/
get_node←r;
exit:end;
@ The following operations remove an empty node from the doubly-linked
list, knowing that it is not the only item in the list.
@d remove_node(#) ==@;
if #=rover then rover←rlink(#);
llink(rlink(#))←llink(#);
rlink(llink(#))←rlink(#)
@ @<Try to allocate...@>=
q←p+node_size(p); {find the physical successor}
while is_empty(q) do {merge |p| with |q|}
begin remove_node(q); q←q+node_size(q);
end;
r←q-s;
if r>p+1 then @<Allocate from the top of node |p| and |goto found|@>;
if (r=p) and ((rlink(p)≠rover) or (llink(p)≠rover)) then
@<Allocate entire node |p| and |goto found|@>;
node_size(p)←q-p {reset the size in case it grew}
@ @<Allocate from the top...@>=
begin node_size(p)←r-p; {store the remaining size}
rover←p; {start searching here next time}
goto found;
end
@ @<Allocate entire...@>=
begin remove_node(p); {delete node |p| from the ring}
rover←rlink(p); {let |rover| rove around}
goto found;
end
@ Conversely, when some variable-size node |p| of size |s| is no longer needed,
the operation |free_node(p,s)| will make its words available, by inserting
|p| as a new empty node just before where |rover| now points.
@p procedure free_node(@!p:pointer; @!s:halfword); {variable-size node
liberation}
var q:pointer; {|llink(rover)|}
begin if @<Node |p| isn't in the variable-size |mem|@> then
confusion("freenode");
@:this can't happen free_node}{\quad freenode@>
node_size(p)←s; link(p)←empty_flag;
q←llink(rover); llink(p)←q; rlink(p)←rover; {set both links}
llink(rover)←p; rlink(q)←p; {insert |p| into the ring}
@!stat var_used←var_used-s;@+tats@;{maintain statistics}
end;
@ Just before \.{INIMF} writes out the memory, it sorts the doubly linked
available space list. The list is probably very short at such times, so a
simple insertion sort is used. The smallest available location will be
pointed to by |rover|, the next-smallest by |rlink(rover)|, etc.
@p @!init procedure sort_avail; {sorts the available variable-size nodes
by location}
var p,@!q,@!r: pointer; {indices into |mem|}
@!old_rover:pointer; {initial |rover| setting}
begin p←get_node(@'10000000000); {merge adjacent free areas}
p←rlink(rover); rlink(rover)←max_halfword; old_rover←rover;
while p≠old_rover do @<Sort |p| into the list starting at |rover|
and advance |p| to |rlink(p)|@>;
p←rover;
while rlink(p)≠max_halfword do
begin llink(rlink(p))←p; p←rlink(p);
end;
rlink(p)←rover; llink(rover)←p;
end;
tini
@ The following |while| loop terminates, since the list that starts at
|rover| ends with |max_halfword| during the sorting procedure.
@<Sort |p|...@>=
if p<rover then
begin q←p; p←rlink(q); rlink(q)←rover; rover←q;
end
else begin q←rover;
while rlink(q)<p do q←rlink(q);
r←rlink(p); rlink(p)←rlink(q); rlink(q)←p; p←r;
end
@* \[11] Data structures for paths.
When a \MF\ user specifies a path, \MF\ will create a list of knots
and control points for the associated cubic spline curves. If the
knots are $z_0$, $z_1$, \dots, $z_n$, there are control points
$z_k↑+$ and $z_{k+1}↑-$ such that the cubic splines between knots
$z_k$ and $z_{k+1}$ are defined by B\'ezier's formula
@:Bezier}{B\'ezier, Pierre@>
$$\eqalign{z(t)&=B(z_k,z_k↑+,z_{k+1}↑-,z_{k+1};t)\cr
&=(1-t)↑3z_k+3(1-t)↑2tz_k↑++3(1-t)t↑2z_{k+1}↑-+t↑3z_{k+1}\cr}$$
for |0≤t≤1|.
There is a 7-word node for each knot $z_k$, containing one word of
control information and six words for the |x| and |y| coordinates
of $z_k↑-$ and $z_k$ and~$z_k↑+$. The control information appears
in the |left_type| and |right_type| fields, which each occupy
a quarter of the first word in the node; they specify properties
of the curve as it enters and leaves the knot. There's also a
halfword |link| field, which points to the following knot.
If the path is a closed contour, knots 0 and |n| are identical;
i.e., the |link| in knot |n-1| points to knot~0. But if the path
is not closed, |left_type| of knot~0 and |right_type| of knot~|n|
are |endpoint|. In the latter case the |link| in knot~|n| points
to knot~0, and the control points $z_0↑-$ and $z_n↑+$ are not used.
@d left_type(#) == mem[#].hh.b0 {characterizes the path entering this knot}
@d right_type(#) == mem[#].hh.b1 {characterizes the path leaving this knot}
@d endpoint=0 {|left_type| at path beginning and |right_type| at path end}
@d x_coord(#) == mem[#+1].sc {the |x| coordinate of this knot}
@d y_coord(#) == mem[#+2].sc {the |y| coordinate of this knot}
@d left_x(#) == mem[#+3].sc {the |x| coordinate of previous control point}
@d left_y(#) == mem[#+4].sc {the |y| coordinate of previous control point}
@d right_x(#) == mem[#+5].sc {the |x| coordinate of next control point}
@d right_y(#) == mem[#+6].sc {the |y| coordinate of next control point}
@d knot_size=7 {number of words in a knot node}
@ Before the B\'ezier control points have been calculated, the memory
space they will ultimately occupy is taken up by information that can be
used to compute them. There are five cases:
\yskip
\textindent{$\bullet$} If |right_type=open|, the curve should leave
the knot in the same direction it entered; \MF\ will figure out a
suitable direction.
\yskip
\textindent{$\bullet$} If |right_type=curl|, the curve should leave the
knot in a direction depending on the angle at which it enters the next
knot and the curl parameter stored in |right_curl|.
\yskip
\textindent{$\bullet$} If |right_type=given|, the curve should leave the
knot in a nonzero direction found in the |x_coord| and |y_coord| fields
of a 3-word node pointed to by |right_given|.
\yskip
\textindent{$\bullet$} If |right_type=explicit|, the B\'ezier control
point for leaving this knot has already been computed; it is in the
|right_x| and |right_y| fields.
\yskip
\textindent{$\bullet$} If |right_type=smooth|, the B\'ezier control
point for leaving this knot has already been computed, just as in the
|explicit| case; furthermore, the |left_type| in this node will also
be |smooth|, and the control points are known to enter and leave this
node in the same direction.
\yskip\noindent
The rules for |left_type| are similar, but they refer to the curve entering
the knot, and to \\{left\_} fields instead of \\{right\_} fields.
Non-|explicit| control points will be chosen based on ``tension'' parameters
stored in the |left_tension| and |right_tension| fields.
For example, the \MF\ path specification
$$\.{z0..z1..z2\{curl 2\}\&z2..z3\{-1,-2\}..tension 3 and 4..p},$$
where \.p is the path `\.{z4..control z45 and z54..z5}', will be represented
by the six knots
$$\vbox{\halign{#\hfil&&\qquad#\hfil\cr
|left_type|&\\{left} info&|x_coord,y_coord|&|right_type|&\\{right} info\cr
\noalign{\yskip}
|endpoint|&---$,\,$---&$x_0,y_0$&|curl|&$1.0,1.0$\cr
|open|&---$,1.0$&$x_1,y_1$&|open|&---$,1.0$\cr
|curl|&$2.0,1.0$&$x_2,y_2$&|curl|&$1.0,1.0$\cr
|given|&$q,1.0$&$x_3,y_3$&|given|&$q,3.0$\cr
|open|&---$,4.0$&$x_4,y_4$&|explicit|&$x_4↑+,y_4↑+$\cr
|explicit|&$x_5↑-,y_5↑-$&$x_5,y_5$&|endpoint|&---$,\,$---\cr}}$$
Here |q| is a 3-word node containing |x_coord=-1.0| and |y_coord=-2.0|.
After the control points have all been calculated, the types in knots
1, 3, and~4 would become |smooth|. Of course, this example is more
complicated than anything a normal user would ever write.
These types must satisfy certain restrictions because of the form of \MF's
path syntax: (i)~|endpoint| and |open| types never appear together in
the same node. (ii)~|open| and |given| types never appear together in
the same node. (iii)~|open| and |curl| types never appear together in
the same node. (iv)~|given| and |curl| types never appear together in the
same node. (v)~|smooth| type appears only when both |left_type| and
|right_type| are |smooth|. (vi)~|explicit| and |smooth| types appear only
in pairs; i.e., the |right_type| of a node is |explicit| or |smooth| if and
only if the |left_type| of the successor node is |explicit| or |smooth|.
(vii)~|endpoint| types occur only at the ends, as mentioned above.
@d left_curl==left_x {curl information when entering this knot}
@d left_given==left_x {pointer to given direction when entering this knot}
@d left_tension==left_y {tension information when entering this knot}
@d right_curl==right_x {curl information when leaving this knot}
@d right_given==right_x {pointer to given direction when leaving this knot}
@d right_tension==right_y {tension information when leaving this knot}
@d coord_node_size=3 {size of nodes pointed to by |left_given|, |right_given|}
@d explicit=1 {|left_type| or |right_type| when control points are known}
@d smooth=2 {like |explicit|, when incoming and outgoing directions are
known to be identical}
@d given=3 {|left_type| or |right_type| when a direction is given}
@d curl=4 {|left_type| or |right_type| when a curl is desired}
@d open=5 {|left_type| or |right_type| when \MF\ should choose the direction}
@ Here is a diagnostic routine that prints the content of a knot list
in symbolic form. It illustrates the conventions discussed above,
and checks for anomalies that might arise while \MF\ is being debugged.
@p procedure print_path(@!h:pointer;@!s:str_number);
label done,done1;
var @!p,@!q,@!r:pointer; {for list traversal}
begin begin_diagnostic;
print_nl("Path at line "); print_int(line); print(s); print_char(":");
@.Path at line...@>
p←h;
repeat if p=null then
begin print_nl("Unexpected end of path!"); goto done;
end;
q←link(p); @<Print information for adjacent knots |p| and |q|@>;
p←q;
if p≠h then print_nl(" ..");
until p=h;
if left_type(h)=open then print("{cycle}");
done:end_diagnostic(true);
end;
@ @<Print information for adjacent knots...@>=
print_scaled(x_coord(p)); print_char(","); print_scaled(y_coord(p));
if right_type(p)≤explicit then @<Check previous |given| or |curl|@>;
case right_type(p) of
endpoint: begin if left_type(p)=open then print("{open?}"); {can't happen}
@.open?@>
if (left_type(q)≠endpoint)∨(q≠h) then q←null; {force an error}
goto done1;
end;
explicit,smooth: @<Print control points between |p| and |q|,
then |goto done1|@>;
open: @<Print information for a curve that begins open@>;
curl: @<Print information for a curve that begins curl@>;
given: @<Print information for a curve that begins given@>;
othercases print("{unknown type!}")
@.unknown type!@>
endcases;@/
if left_type(q)≤smooth then print("..control?") {can't happen}
@.control?@>
else if (right_tension(p)≠unity)∨(left_tension(q)≠unity) then
@<Print tension between |p| and |q|@>;
done1:
@ @<Check previous |given| or |curl|@>=
if left_type(p)=given then
begin r←left_given(p); print_char("{"); print_scaled(x_coord(r));
print_char(","); print_scaled(y_coord(r)); print_char("}");
end
else if left_type(p)=curl then
begin print("{curl "); print_scaled(left_curl(p)); print_char("}");
end
@ @<Print tension between |p| and |q|@>=
begin print("..tension "); print_scaled(right_tension(p));
if right_tension(p)≠left_tension(q) then
begin print(" and "); print_scaled(left_tension(q));
end;
end
@ @<Print control points between |p| and |q|, then |goto done1|@>=
begin if right_type(p)=smooth then if left_type(p)≠smooth then
print("{smooth?}"); {this can't happen}
@.smooth?@>
print("..control "); print_scaled(right_x(p)); print_char(",");
print_scaled(right_y(p)); print(" and ");
if (left_type(q)≠explicit)∧(left_type(q)≠smooth) then
print("??") {can't happen}
@.??@>
else begin print_scaled(left_x(q)); print_char(",");
print_scaled(left_y(q));
end;
goto done1;
end
@ @<Print information for a curve that begins open@>=
if (left_type(p)≠explicit)∧(left_type(p)≠open) then
print("{open?}") {can't happen}
@.open?@>
@ @<Print information for a curve that begins curl@>=
begin if left_type(p)≥given then
if left_type(p)≠curl then print("??") {can't happen}
@.??@>
else if left_curl(p)≠right_curl(p) then
begin print("{curl "); print_scaled(left_curl(p));
print_char("}"); @<Repeat the coordinates@>;
end;
print("{curl "); print_scaled(right_curl(p)); print_char("}");
end
@ @<Print information for a curve that begins given@>=
begin if left_type(p)≥given then
if left_type(p)≠given then print("??") {can't happen}
@.??@>
else if left_given(p)≠right_given(p) then
begin r←left_given(p); print_char("{");
print_scaled(x_coord(r)); print_char(","); print_scaled(y_coord(r));
print_char("}"); @<Repeat the coordinates@>;
end;
r←right_given(p); print_char("{"); print_scaled(x_coord(r)); print_char(",");
print_scaled(y_coord(r)); print_char("}");
end
@ @<Repeat the coordinates@>=
print_char("&"); print_scaled(x_coord(p));
print_char(","); print_scaled(y_coord(p))
@* \[12] Memory layout.
Some areas of |mem| are dedicated to fixed usage, since static allocation is
more efficient than dynamic allocation when we can get away with it. For
example, λλλlocations |mem_base| to |mem_base+3| are always used to store the
specification for glue that is `\.{0pt plus 0pt minus 0pt}'. The
following macro definitions accomplish the static allocation by giving
symbolic names to the fixed positions. Dynamic allocation of variable-size
nodes is restricted to locations |first_mem| through |(hi_mem_base-1)|,
and single-word nodes are dynamically allocated in locations |second_mem|
through |mem_max|, inclusive. It is harmless to let |lig_trick|, |garbage|,
and |backup_head| share the same location of |mem|.
λλλ Fix this all!
I need |left_delta| and |right_delta| in the lower memory area, where
I use their |x_coord| and |y_coord| words only.
@d zero_glue==mem_base {specification for \.{0pt plus 0pt minus 0pt}}
@d fil_glue==zero_glue+glue_spec_size {\.{0pt plus 1fil minus 0pt}}
@d fill_glue==fil_glue+glue_spec_size {\.{0pt plus 1fill minus 0pt}}
@d ss_glue==fill_glue+glue_spec_size {\.{0pt plus 1fil minus 1fil}}
@d fil_neg_glue==ss_glue+glue_spec_size {\.{0pt plus -1fil minus 0pt}}
@d first_mem==fil_neg_glue+glue_spec_size {first dynamically allocatable word
in the variable-size |mem|}
@#
@d page_ins_head==hi_mem_base {list of insertion data for current page}
@d contrib_head==hi_mem_base+1 {vlist of items not yet on current page}
@d page_head==hi_mem_base+2 {vlist for current page}
@d temp_head==hi_mem_base+3 {head of a temporary list of some kind}
@d hold_head==hi_mem_base+4 {head of a temporary list of another kind}
@d adjust_head==hi_mem_base+5 {head of adjustment list returned by |hpack|}
@d active==hi_mem_base+6 {head of active list in |line_break|, needs two words}
@d align_head==hi_mem_base+8 {head of preamble list for alignments}
@d end_span==hi_mem_base+9 {tail of spanned-width lists}
@d omit_template==hi_mem_base+10 {a constant token list}
@d null_list==hi_mem_base+11 {permanently empty list}
@d lig_trick==hi_mem_base+12 {a ligature masquerading as a |char_node|}
@d garbage==hi_mem_base+12 {used for scrap information}
@d backup_head==hi_mem_base+13 {head of token list built by |scan_keyword|}
@d second_mem==hi_mem_base+14 {first dynamically allocatable word in
the one-word |mem|}
@<Node |p| isn't in the variable-size |mem|@>=(p<first_mem)∨(p≥hi_mem_base)
@ The following code gets |mem| off to a good start, when \TeX\ is
initializing itself the slow way.
@<Local variables for init...@>=
@!k:integer; {index into |mem|, |eqtb|, etc.}
@ @<Initialize table entries...@>=
for k←mem_base+1 to first_mem-1 do mem[k].sc←0;
{all glue dimensions are zeroed}
@↑data structure assumptions@>
k←mem_base;@+while k<first_mem do {set first words of glue specifications}
begin glue_ref_count(k)←null+1;
stretch_order(k)←normal; shrink_order(k)←normal;
k←k+glue_spec_size;
end;
stretch(fil_glue)←unity; stretch_order(fil_glue)←fil;@/
stretch(fill_glue)←unity; stretch_order(fill_glue)←fill;@/
stretch(ss_glue)←unity; stretch_order(ss_glue)←fil;@/
shrink(ss_glue)←unity; shrink_order(ss_glue)←fil;@/
stretch(fil_neg_glue)←-unity; stretch_order(fil_neg_glue)←fil;@/
rover←first_mem; link(rover)←empty_flag; {now initialize the dynamic memory}
node_size(rover)←hi_mem_base-rover; {which is one big available node}
llink(rover)←rover; rlink(rover)←rover;@/
link(hi_mem_base)←null; info(hi_mem_base)←null;
for k←hi_mem_base+1 to second_mem-1 do
mem[k]←mem[hi_mem_base];{clear list heads}
@<Initialize the special list heads and constant nodes@>;
avail←null; mem_end←second_mem-1; {initialize the one-word memory}
var_used←first_mem-mem_base; dyn_used←second_mem-hi_mem_base;
max_var_used←var_used; {initialize statistics}
@ If \TeX\ is extended improperly, the |mem| array might get screwed up.
For example, some pointers might be wrong, or some ``dead'' nodes might not
have been freed when the last reference to them disappeared. Procedures
|check_mem| and |search_mem| are available to help diagnose such
problems. These procedures make use of two arrays called |free| and
|was_free| that are present only if \TeX's debugging routines have
been included. (You may want to decrease the size of |mem| while you
@↑debugging@>
are debugging; decreasing |mem_max| saves space and time,
and decreasing |hi_mem_base| saves time.)
@<Glob...@>=
@!debug @!free: packed array [mem_base..mem_max] of boolean; {free cells}
@t\hskip1em@>@!was_free: packed array [mem_base..mem_max] of boolean;
{previously free cells}
@t\hskip1em@>@!was_mem_end: pointer; {previous |mem_end|}
@t\hskip1em@>@!panicking:boolean; {do we want to check memory constantly?}
gubed
@ @<Set initial...@>=
@!debug was_mem_end←mem_base; {indicate that everything was previously free}
panicking←false;
gubed
@ Procedure |check_mem| makes sure that the available space lists of
|mem| are well formed, and it optionally prints out all locations
that are reserved now but were free the last time this procedure was called.
@p @!debug procedure check_mem(@!print_locs : boolean);
label done1,done2; {loop exits}
var p,@!q:pointer; {current locations of interest in |mem|}
@!clobbered:boolean; {is something amiss?}
begin for p←mem_base to mem_end do free[p]←false; {you can probably
do this faster}
@<Check single-word |avail| list@>;
@<Check variable-size |avail| list@>;
@<Check flags of unavailable nodes@>;
if print_locs then @<Print newly busy locations@>;
for p←0 to mem_end do was_free[p]←free[p]; {|was_free←free| might be faster}
was_mem_end←mem_end;
end;
gubed
@ @<Check single-word...@>=
p←avail; q←null; clobbered←false;
while p≠null do
begin if (p>mem_end)∨(p<second_mem) then clobbered←true
else if free[p] then clobbered←true;
if clobbered then
begin print_nl("AVAIL list clobbered at ");
@.AVAIL list clobbered...@>
print_int(q); goto done1;
end;
free[p]←true; q←p; p←link(q);
end;
done1:
@ @<Check variable-size...@>=
p←rover; q←null; clobbered←false;
repeat if (p≥hi_mem_base)∨(p<first_mem) then clobbered←true
else if (rlink(p)≥hi_mem_base)∨(rlink(p)<first_mem) then clobbered←true
else if ¬(is_empty(p))∨(node_size(p)<2)∨@|
(p+node_size(p)>hi_mem_base)∨@| (llink(rlink(p))≠p) then clobbered←true;
if clobbered then
begin print_nl("Double-AVAIL list clobbered at ");
print_int(q); goto done2;
end;
for q←p to p+node_size(p)-1 do {mark all locations free}
begin if free[q] then
begin print_nl("Doubly free location at ");
@.Doubly free location...@>
print_int(q); goto done2;
end;
free[q]←true;
end;
q←p; p←rlink(p);
until p=rover;
done2:
@ @<Check flags...@>=
p←mem_base;
while p≤hi_mem_base do {node |p| should not be empty}
begin if is_empty(p) then
begin print_nl("Bad flag at "); print_int(p);
@.Bad flag...@>
end;
while (p≤hi_mem_base) and not free[p] do incr(p);
while (p≤hi_mem_base) and free[p] do incr(p);
end
@ @<Print newly busy...@>=
begin print_nl("New busy locs:");
for p←mem_base to mem_end do
if not free[p] and ((p>was_mem_end) or was_free[p]) then
begin print_char(" "); print_int(p);
end;
end
@ The |search_mem| procedure attempts to answer the question ``Who points
to node~|p|?'' In doing so, it fetches |link| and |info| fields of |mem|
that might not be of type |two_halves|. Strictly speaking, this is
@↑dirty \PASCAL@>
undefined in \PASCAL, and it can lead to ``false drops'' (words that seem to
point to |p| purely by coincidence). But for debugging purposes, we want
to rule out the places that do {\sl not\/} point to |p|, so a few false
drops are tolerable.
@p @!debug procedure search_mem(@!p:pointer); {look for pointers to |p|}
var q:integer; {current position being searched}
begin for q←mem_base to mem_end do
begin if link(q)=p then
begin print_nl("LINK("); print_int(q); print_char(")");
end;
if info(q)=p then
begin print_nl("INFO("); print_int(q); print_char(")");
end;
end;
@<Search |eqtb| for equivalents equal to |p|@>;
@<Search |save_stack| for equivalents that point to |p|@>;
@<Search |hyph_list| for pointers to |p|@>;
end;
gubed
@* \[13] Choosing control points.
Now we must actually delve into one of \MF's more difficult routines,
the |make_choices| procedure that chooses angles and control points for
the splines of a curve when the user has not specified them explicitly.
The first parameter to |make_choices| points to a list of knots and
path information, as described above; the second parameter points to a
list of coordinate nodes that will be returned to available memory
after all of the control points have been chosen.
A path decomposes into independent segments at ``breakpoint'' knots,
which are knots whose left and right angles are both prespecified in
some way (i.e., neither |left_type| nor |right_type| is open).
@p @<Declare procedures needed by |make_choices|@>@;
procedure make_choices(@!h:knots,@!coords:pointer);
label done,done1,found;
var @!h:pointer; {the first breakpoint}
@!p,@!q:pointer; {consecutive breakpoints being processed}
@<Local variables for |make_choices|@>@;
begin if fetch_parameter(tracing_choices)>0 then
print_path(knots,", before choices");
arith_error←false;
@<If consecutive knots are equal, join them explicitly@>;
@<Find the first breakpoint, |h|, on the path;
insert an artificial breakpoint if the path is an unbroken cycle@>;
p←h;
repeat @<Fill in the control points between |p| and the next breakpoint,
then advance |p| to that breakpoint@>;
until p=h;
if fetch_parameter(tracing_choices)>0 then
print_path(knots,", after choices");
while coords≠null do
begin p←coords; coords←link(p); free_node(p,coord_node_size);
end;
if arith_error then @<Report an unexpected problem during the choice-making@>;
end;
@ @<Report an unexpected problem during the choice...@>=
begin print_err("Some number got too big");
help2("The path that I just computed is out of range.")@/
("So it will probably look funny. Proceed, for a laugh.");
error;
end
@ Two knots in a row with the same coordinates will always be joined
by an explicit ``curve'' whose control points are identical with the
knots.
@<If consecutive knots are equal, join them explicitly@>=
p←knots;
repeat q←link(p);
if x_coord(p)=x_coord(q) then if y_coord(p)=y_coord(q) then
if right_type(p)>explicit then
begin right_type(p)←explicit;
if left_type(p)=open then
begin left_type(p)←curl; left_curl(p)←unity;
end;
left_type(q)←explicit;
if right_type(q)=open then
begin right_type(q)←curl; right_curl(p)←unity;
end;
right_x(p)←x_coord(p); left_x(q)←x_coord(p);
right_y(p)←y_coord(p); left_y(q)←y_coord(p);
end;
p←q;
until p=knots
@ If there are no breakpoints, it is necessary to compute the direction
angles around an entire cycle. In this case the |left_type| of the first
node is temporarily changed to |end_cycle|.
@d end_cycle=open+1
@<Find the first breakpoint, |h|, on the path...@>=
h←knots;
loop@+ begin if left_type(h)≠open then goto done1;
if right_type(h)≠open then goto done1;
h←link(h);
if h=knots then
begin left_type(h)←end_cycle; goto done1;
end;
end;
done1:
@ @<Fill in the control points between |p| and the next breakpoint...@>=
q←link(p);
if right_type(p)≥given then
begin while (left_type(q)=open)∧(right_type(q)=open) do q←link(q);
@<Fill in the control information between
consecutive breakpoints |p| and |q|@>;
end;
p←q
@ Before we can go further into the way choices are made, we need to
consider the underlying theory. The basic ideas implemented in |make_choices|
are due to John Hobby, who introduced the notion of ``mock curvature''
@↑Hobby, John Douglas@>
at a knot. Angles are chosen so that they preserve mock curvature when
a knot is passed, and this has been found to produce excellent results.
\def\k{_{k+1}}
It is convenient to introduce some notations that simplify the necessary
formulas. Let $d_{k,k+1}=\vert z\k-z_k\vert$ be the (nonzero) distance
between knots |k| and |k+1|; and let
$${z_k+1}-z_k\over z_k-z_{k-1}}={d_{k,k+1}\over d_{k-1,k}}e↑{i\psi_k}$$
so that a polygonal line from $z_{k-1}$ to $z_k$ to $z\k$ turns left
through an angle of~$\psi_k$. We assume that $\vert\psi_k\L180↑\circ$.
The control points for the spline from $z_k$ to $z\k$ will be denoted by
$$\eqalign{z_k↑+&=z_k+
\textstyle{1\over3}\rho_k e↑{i\theta_k}(z\k-z_k),\cr
{z\k↑-&=z\k-
\textstyle{1\over3}\sigma\k e↑{i\phi\k}(z\k-z_k),\cr}$$
where $\rho_k$ and $\sigma\k$ are nonnegative ``velocity ratios'' at the
beginning and end of the curve, while $\theta_k$ and $\phi\k$ are the
corresponding ``offset angles.'' These angles satisfy the condition
$$\theta_k+\phi_k+\psi_k=0,\eqno(*)$$
whenever the curve leaves an intermediate knot~|k| in the direction that
it enters.
@ Let $\alpha_k$ and $\beta\k$ be the reciprocals of the ``tension'' of
the curve at its beginning and ending points. This means that
$\rho_k=\alpha_k f(\theta_k,\phi\k)$ and $\sigma\k=\beta\k f(\phi\k,\theta_k)$,
where $f(\theta,\phi)$ is \MF's standard velocity function defined in
the |velocity| subroutine. The cubic spline $B(z_k,z_k↑+,z\k↑-,z\k;t)
has curvature
$${2\sigma\k\sin(\theta_k+\phi\k)-6\sin\theta_k\over\rho_k↑2d_{k,k+1}}
\qquad{\rm and}\qquad
{2\rho\k\sin(\theta_k+\phi\k)-6\sin\phi\k\over\sigma\k↑2d_{k,k+1}}$$
at |t=0| and |t=1|, respectively. The mock curvature is the linear
approximation to this true curvature that arises in the limit for
small $\theta_k$ and~$\phi\k$, if second-order terms are discarded.
The standard velocity function satisfies
$$f(\theta,\phi)=1+O(\theta↑2+\theta\phi+\phi↑2);$$
hence the mock curvatures are respectively
$${2\beta\k(\theta_k+\phi\k)-6\theta_k\over\alpha_k↑2d_{k,k+1}}
\qquad{\rm and}\qquad
{2\alpha\k(\theta_k+\phi\k)-6\phi\k\over\beta_k↑2d_{k,k+1}}.\eqno(**)$$
@ The turning angles $\psi_k$ are given, and equation $(*)$ above
determines $\phi_k$ when $\theta_k$ is known, so the task of
angle selection is essentially to choose appropriate values for each
$\theta_k$. When equation~$(*)$ is used to eliminate $\phi$~variables
from $(**)$, we obtain a system of linear equations of the form
$$A_k\theta_{k-1}+(B_k+C_k)\theta_k+D_k\theta\k=-B_k\psi_k-D_k\psi\k,$$
where
$$A_k={\alpha_{k-1}\over\beta_k↑2d_{k-1,k}},
\qquad B_k={3-\alpha_{k-1}\over\beta_k↑2d_{k-1,k}},
\qquad C_k={3-\beta\k\over\alpha_k↑2d_{k,k+1}},
\qquad D_k={\beta\k\over\alpha_k↑2d_{k,k+1}}.$$
The tensions are always $3\over4$ or more, hence each $\alpha$ and~$\beta$
will be at most $4\over3$. It follows that $B_k\G{5\over4}A_k$ and
$C_k\G{5\over4}D_k$; i.e., the equations are diagonally dominant;
hence they have a unique solution. Moreover, in most cases the tensions
are equal to_1, so that $B_k=2A_k$ and $C_k=2D_k$. This makes the
solution numerically stable, and there is an exponential damping
effect: The data at knot $k\pm j$ affects the angle at knot~$k$ by
a factor of~$O(2↑{-j})$.
@ However, we still must consider the angles at the starting and ending
knots of a non-cyclic path. These angles might be given explicitly, or
they might be specified implicity in terms of an amount of ``curl.''
Let's assume that angles need to be determined for a non-cyclic path
starting at $z_0$ and ending at~$z_n$. Then equations of the form
$$A_k\theta_{k-1}+(B_k+C_k)\theta_k+D_k\theta_{k+1}=R_k$$
have been given for $0<k<n$, and it will be convenient to introduce
equations of the same form for $k=0$ and $k=n$, where
$$A_0=B_0=C_n=D_n=0.$$
If $\theta_0$ is supposed to have a given value $E_0$, we simply
define $C_0=0$, $D_0=0$, and $R_0=E_0$. Otherwise a curl
parameter, $\gamma_0$, has been specified at~$z_0$; this means
that the mock curvature at $z_0$ should be $\gamma_0$ times the
mock curvatue at $z_1$; i.e.,
$${2\beta_1(\theta_0+\phi_1)-6\theta_0\over\alpha_0↑2d_{01}}
=\gamma_0{2\alpha_0(\theta_0+\phi_1)-6\phi_1\over\beta_1↑2d_{01}}.$$
This equation simplifies to
$$(\alpha_0\chi_0+3-\beta_1)\theta_0+
\bigl((3-\alpha_0)\chi_0+\beta_1\bigr)\theta_1=
-\bigl((3-\alpha_0)\chi_0+\beta_1\bigr)\psi_1,$$
where $\chi_0=\alpha_0↑2\gamma_0/\beta_1↑2$; so we can set $C_0=
3-\beta_1+\chi_0\alpha_0$, $D_0=(3-\alpha_0)\chi_0+\beta_1$, $R_0=-D_0\psi_1$.
It can be shown that $C_0>0$ and $C_0B_1-A_1D_0>0$ when $\gamma_0\G0$,
hence the linear equations remain nonsingular.
Similar considerations apply at the right end, when the final angle $\phi_n$
may or may not need to be determined. It is convenient to let $\phi_n=0$,
hence $\theta_n=-\phi_n$. We either have an explicit equation $\theta_n=E_n$,
or we have
$$\bigl((3-\beta_n)\chi_n+\alpha_{n-1})\theta_{n-1}+
(\beta_n\chi_n+3-\alpha_{n-1})\theta_n=0,\qquad
\rho_n={\beta_n↑2\gamma_n\over\alpha_{n-1}↑2}.$$
When |make_choices| chooses angles, it must compute the coefficients of
these linear equations, then solve the equations. To compute the coefficients,
it is necessary to compute arctangents of the given turning angles~$\psi_k$.
When the equations are solved, the chosen directions $\theta_k$ are put
back into the form of control points by essentially computing sines and
cosines.
@ OK, we are ready to make the hard choices of |make_choices|.
@<Fill in the control information between...@>=
@<Calculate the turning angles $\psi_k$ and the distances $d_{k,k+1}$;
set $n$ to the length of the path@>;
@<Remove |open| types at the breakpoints@>;
@<Solve linear equations and fill in the control information
between breakpoints |p| and~|q|,
unless a simple case makes this calculations unnecessary@>
@ It's convenient to precompute quantities that will be needed several
times later. The values of |delta_x[k]| and |delta_y[k]| will be the
coordinates of $z\k-z_k$, and the magnitude of this vector will be
|delta[k]=@t$d_{k,k+1}$@>|. The path angle $\psi_k$ between $z_k-z_{k-1}$
and $z\k-z_k$ will be stored in |psi[k]|.
@<Glob...@>=
@!delta_x,@!delta_y,@!delta:array[0..path_size] of scaled; {knot differences}
@!psi:array[0..path_size] of angle; {turning angles}
@ @<Local variables for |make_choices|@>=
@!k,@!n:0..path_size; {current and final knot numbers}
@!r,@!s,@!t:pointer; {registers for list traversal}
@!sine,@!cosine:fraction; {trig functions of various angles}
@ @<Calculate the turning angles...@>=
k←0; s←p; n←path_size;
repeat t←link(s);
delta_x[k]←x_coord(t)-x_coord(s);
delta_y[k]←y_coord(t)-y_coord(s);
delta[k]←pyth_add(delta_x[k],delta_y[k]);
if k>0 then
begin sine←make_fraction(delta_y[k-1],delta[k-1]);
cosine←make_fraction(delta_x[k-1],delta[k-1]);
psi[k]←m_atan(take_fraction(delta_x[k],cosine)+
take_fraction(delta_y[k],sine),
take_fraction(delta_y[k],cosine)-
take_fraction(delta_x[k],sine));
end;
@:METAFONT capacity exceeded path size}{\quad path size@>
incr(k); s←t;
if k=path_size then overflow("path size",path_size);
if s=q then n←k;
until (k≥n)∧(left_type(s)≠end_cycle);
if k=n then psi[n]←0
@ When we get to this point of the code, |right_type(p)| is either
|given| or |curl| or |open|. If it is |open|, we must have
|left_type(p)=end_cycle| or |left_type(p)=explicit|. In the latter
case, the |open| type is converted to |given|; however, if the
velocity coming in to this knot is zero, the |open| type is
converted to a |curl|, since we don't know the incoming direction.
Similarly, |left_type(q)| is either |given| or |curl| or |open| or
|end_cycle|. The |open| possibility is reduced either to |given| or to |curl|.
@<Remove |open| types at the breakpoints@>=
if left_type(q)=open then
begin x_coord(right_delta)←right_x(q)-x_coord(q);
y_coord(right_delta)←right_y(q)-y_coord(q);
if (x_coord(right_delta)=0)∧(y_coord(right_delta)=0) then
begin left_type(q)←curl; left_curl(q)←unity;
end
else begin left_type(q)←given; left_given(q)←right_delta;
end;
if (right_type(p)=open)∧(left_type(p)=explicit) then
begin x_coord(left_delta)←x_coord(q)-left_x(q);
y_coord(left_delta)←y_coord(q)-left_y(q);
if (x_coord(left_delta)=0)∧(y_coord(left_delta)=0) then
begin right_type(q)←curl; right_curl(q)←unity;
end
else begin right_type(q)←given; right_given(q)←left_delta;
end
@ Linear equations need to be solved whenever |n>1|; and also when |n=1|
and exactly one of the breakpoints involves a curl. The simplest case occurs
when |n=1| and there is a curl at both breakpoints; then we simply draw
a straight line.
But before coding up the simple cases, we might as well face the general case,
since we must deal with it sooner or later, and since the general case
is likely to give some insight into the way simple cases are handled best.
When there is no cycle, the linear equations to be solved form a tri-diagonal
system, and we can apply the standard technique of Gaussian elimination
to convert that system to a sequence of equations of the form
$$\theta_0+u_0\theta_1=v_0,\quad
\theta_1+u_1\theta_2=v_1,\quad\ldots,
\theta_{n-1}+u_{n-1}\theta_n=v_{n-1},\quad
\theta_n=v_n.$$
It is possible to do this diagonalization while generating the equations.
Once $\theta_n$ is known, it is easy to determine $\theta_{n-1}$, \dots,
$\theta_1$, $\theta_0$; thus, the equations will be solved.
The procedure is slightly more complex when there is a cycle, but the basic
idea will be nearly the same. In the right-hand sides will be $v_k+w_k\theta_0$
instead of simply $v_k$, and we will start the process off with $u_0=v_0=0$,
$w_0=1$. The final equation will be not $\theta_n=v_n$ but
$\theta_n+u_n\theta_1=v_n+w_n\theta_0$; an appropriate ending routine
will take account of the fact that $\theta_n=\theta_0$ and eliminate
the $w$'s from the system, after which the solution can be obtained
as before.
When $u_k$, $v_k$, and $w_k$ are being computer, the three pointer
variables |r|, |s|,~|t| will point respectively to knots |k-1|, |k|,
and~|k+1|. The $u$'s and $w$'s are scaled by $2↑{28}$, i.e., they are
of type |fraction|; the $\theta$'s and $v$'s are of type |angle|.
@<Glob...@>=
@!theta:array[0..path_size] of angle; {values of $\theta_k$}
@!uu:array[0..path_size] of fraction; {values of $u_k$}
@!vv:array[0..path_size] of angle; {values of $v_k$}
@!ww:array[0..path_size] of fraction; {values of $w_k$}
@ Our immediate problem is to get the ball rolling by setting up the
first equation or by realizing that no equations are needed, and to fit
this initialization into a framework suitable for the overall computation.
@<Solve linear equations and fill in the control information...@>=
k←0; s←p;
loop@+ begin t←link(s);
if k=0 then @<Get the linear equations started; or |goto done|
with the control points in place, if linear equations
needn't be solved@>
else case left_type(s) of
end_cycle,open:@<Set up equation to match mock curvatures
at $z_k$; then |goto found| with $\theta_n$
adjusted to equal $\theta_0$, if a cycle has ended@>;
curl:@<Set up equation for a curl at $\theta_n$
and |goto found|@>;
given:@<Calculate the given value of $\theta_n$
and |goto found|@>;
end; {there are no other cases}
r←s; s←t; incr(k);
end;
found:@<Finish choosing angles and assigning control points@>;
done:
@ On the first time through the loop, we have |k=0| and |r| is not yet
defined. The first linear equation, if any, will have $A_0=B_0=0$.
@<Get the linear equations started...@>=
case right_type(s) of
given: if left_type(t)=given then @<Reduce to simple case of two givens
and |goto done|@>
else @<Set up the equation for a given value of $\theta_0$@>;
curl: if left_type(t)=curl then @<Reduce to simple case of straight line
and |goto done|@>
else @<Set up the equation for a curl at $\theta_0$@>;
open: begin uu[0]←0; vv[0]←0; ww[0]←fraction_one;
end; {this begins a cycle}
end {there are no other cases}
@ The general equation that specifies equality of mock curvature at $z_k$ is
$$A_k\theta_{k-1}+(B_k+C_k)\theta_k+D_k\theta\k=-B_k\psi_k-D_k\psi\k,$$
as derived above. We want to combine this with the already-derived equation
$\theta_{k-1}+u_{k-1}\theta_k=v_{k-1}+w_{k-1}\theta_0$ in order to obtain
a new equation
$\theta_k+u_k\theta\k=v_k+w_k\theta_0$. This can be done by dividing the
equation
$$(B_k-u_{k-1}A_k+C_k)\theta_k+D_k\theta\k=-B_k\psi_k-D_k\psi\k-A_kv_{k-1}
-A_kw_{k-1}\theta_0$$
by $B_k-u_{k-1}A_k+C_k$. The trick is to do this carefully with
fixed-point arithmetic, avoiding the chance of overflow while retaining
suitable precision.
The calculations will be performed in several registers that
provide temporary storage for intermediate quantities.
@<Local variables for |make_choices|@>=
@!aa,@!bb,@!cc,@!dd,@!ee,@!ff,@!acc:fraction; {temporary registers}
@ @<Set up equation to match mock curvatures...@>=
begin @<Calculate the values $\\{aa}=A_k/B_k$, $\\{bb}=D_k/C_k$,
$\\{dd}=(3-\alpha_{k-1})d_{k,k+1}$, $\\{ee}=(3-\beta\k)d_{k-1,k}$,
and $\\{cc}=(B_k-u_{k-1}A_k)/B_k$@>;
@<Calculate the ratio $\\{ff}=C_k/(C_k+B_k-u_{k-1}A_k)$$@>;
uu[k]←take_fraction(ff,bb);
@<Calculate the values of $v_k$ and $w_k$@>;
if left_type(s)=end_cycle then
@<Adjust $\theta_n$ to equal $\theta_0$ and |goto found|@>;
end
@ Since tension values are never less than 3/4, the values |aa| and
|bb| computed here are never more than 4/5.
@<Calculate the values $\\{aa}=...@>=
if right_tension(r)=unity then
begin aa←fraction_half; dd←2*delta[k];
end
else begin aa←make_fraction(unity,3*right_tension(r)-1);
dd←take_fraction(delta[k],
fraction_three-make_fraction(unity,right_tension(r)));
end;
if left_tension(t)=unity then
begin bb←fraction_half; ee←2*delta[k-1];
end
else begin bb←make_fraction(unity,3*left_tension(t)-1);
ee←take_fraction(delta[k-1],
fraction_three-make_fraction(unity,left_tension(t)));
end;
cc←fraction_one-take_fraction(uu[k-1],aa)
@ The ratio to be calculated in this step can be written in the form
$$\beta_k↑2\cdot\\{ee}\over\beta_k↑2\cdot\\{ee}+\alpha_k↑2\cdot\\{cc}\\{dd},$$
because of the quantities just calculated. The values of |dd| and |ee|
will not be needed after this step has been performed.
@<Calculate the ratio $\\{ff}=C_k/(C_k+B_k-u_{k-1}A_k)$$@>=
dd←take_fraction(dd,cc);
if left_tension(s)≠right_tension(s) then {$\beta_k≠\alpha_k$}
if left_tension(s)<right_tension(s) then
begin ff←make_fraction(left_tension(s),right_tension(s));
ff←take_fraction(ff,ff); {$\beta_k↑2/\alpha_k↑2$}
ee←take_fraction(ee,ff);
end
else begin ff←make_fraction(right_tension(s),left_tension(s));
ff←take_fraction(ff,ff); {$\alpha_k↑2/\beta_k↑2$}
dd←take_fraction(dd,ff);
end;
ff←make_fraction(ee,ee+dd)
@ The value of $u_[k-1]$ will be |≤1| except when $k=1$ and the previous
equation was specified by a curl. In that case we must use a special
method of computation to prevent overflow.
Fortunately, the calculations turn out to be even simpler in this ``hard''
case. The curl equation makes $w_0=0$ and $v_0=-u_0\psi_1$, hence
$-B_1\psi_1-A_1v_0=-(B_1-u_0A_1)\psi_1=-\\{cc}\cdot B_1\psi_1$.
@<Calculate the values of $v_k$ and $w_k$@>=
acc←-take_fraction(psi[k+1],uu[k]);
if right_type(r)=curl then
begin ww[k]←0;
vv[k]←acc-take_fraction(psi[1],fraction_one-ff);
end
else begin ff←make_fraction(fraction_one-ff,cc); {this is
$B_k/(C_k+B_k-u_{k-1}A_k)<5$}
acc←acc-take_fraction(psi[k],ff);
ff←take_fraction(ff,aa); {this is $A_k/(C_k+B_k-u_{k-1}A_k)$}
vv[k]←acc-take_fraction(vv[k-1],ff);
if ww[k-1]=0 then ww[k]←0
else ww[k]←-take_fraction(ww[k-1],ff);
end
@ When a complete cycle has been traversed, we have $\theta_k+u_k\theta\k=
v_k+w_k\theta_0$, for |1≤k≤n|. We would like to determine the value of
$\theta_n$ ad reduce the system to the form $\theta_k+u_k\theta\k=v_k$
for |0≤k<n|, so that the cyclic case can be finished up just as if there
were no cycle.
The idea in the following code is to observe that
$$\eqalign{\theta_n&=v_n+w_n\theta_0-u_n\theta_1=\cdots\cr
&=v_n+w_n\theta_0-u_n\bigl(v_1+w_1\theta_0-u_1(v_2+\cdots
-u_{n-2}(v_{n-1}+w_{n-1}\theta_0-u_{n-1}\theta_0))\bigr),\cr}$$
so we can solve for $\theta_n=\theta_0$.
@<Adjust $\theta_n$ to equal $\theta_0$ and |goto found|@>=
begin aa←0; bb←fraction_one; {we have |k=n|}
repeat decr(k);
if k=0 then k←n;
aa←vv[k]-take_fraction(aa,uu[k]);
bb←ww[k]-take_fraction(bb,uu[k]);
until k=n;
aa←make_fraction(aa,fraction_one-bb);
theta[n]←aa; vv[0]←aa;
for k←1 to n-1 do vv[k]←vv[k]+take_fraction(aa,ww[k]);
goto found;
end
@ @<Calculate the given value of $\theta_n$...@>=
begin r←left_given(s);@/
sine←make_fraction(delta_y[n-1],delta[n-1]);
cosine←make_fraction(delta_x[n-1],delta[n-1]);@/
theta[n]←n_atan(
-take_fraction(x_coord(r),cosine)-take_fraction(y_coord(r),sine),@|
-take_fraction(y_coord(r),cosine)+take_fraction(x_coord(r),sine));
goto found;
end
@ @<Set up the equation for a given value of $\theta_0$@>=
begin r←right_given(s);@/
sine←make_fraction(delta_y[0],delta[0]);
cosine←make_fraction(delta_x[0],delta[0]);@/
vv[0]←n_atan(
take_fraction(x_coord(r),cosine)+take_fraction(y_coord(r),sine),@|
take_fraction(y,coord(r),cosine)-take_fraction(x_coord(r),sine));
uu[0]←0; ww[0]←0;
end
@ @<Set up the equation for a curl at $\theta_0$@>=
begin cc←right_curl(s);
if (right_tension(s)=unity)∧(left_tension(t)=unity) then
uu[0]←make_fraction(cc+cc+1,cc+2)
else uu[0]←curl_ratio(cc,right_tension(s),left_tension(t));
vv[0]←-take_fraction(psi[1],uu[0]); ww[0]←0;
end
@ @<Set up equation for a curl at $\theta_n$...@>=
begin cc←left_curl(s);
if (right_tension(r)=unity)∧(left_tension(s)=unity) then
ff←make_fraction(cc+cc+1,cc+2)
else ff←curl_ratio(cc,left_tension(s),right_tension(r));
theta[n]←-make_fraction(take_fraction(vv[n-1],ff),
fraction_one-take_fraction(ff,uu[n-1]));
goto found;
end
@ The |curl_ratio| subroutine has three arguments, which our previous notation
encourages us to call $\gamma$, $\alpha↑{-1}$, and $\beta↑{-1}$. It is
a somewhat tedious program to calculate
$${(3-\alpha)\alpha↑2\gamma+\beta↑3\over
\alpha↑3\gamma+(3-\beta)\beta↑2},$$
with the result reduced to 4 if it exceeds 4. (This reduction of curl
is necessary only if the curl and tension are both large.)
@<Declare procedures needed by |make_choices|@>=
function curl_ratio(@!gamma,@!a_tension,@!b_tension:scaled):fraction;
var @!alpha,@!beta,@!num,@!denom,@!ff:fraction; {registers}
begin alpha←make_fraction(unity,a_tension);
beta←make_fraction(unity,b_tension);@/
if alpha≤beta then
begin ff←make_fraction(alpha,beta); ff←take_fraction(ff,ff);
gamma←take_fraction(gamma,ff);@/
beta←beta div @'10000; {convert |fraction| to |scaled|}
denom←take_fraction(gamma,alpha)+unity+unity+unity-beta;
num←take_fraction(gamma,fraction_three-alpha)+beta;
end
else begin ff←make_fraction(beta,alpha); ff←take_fraction(ff,ff);
beta←beta div @'10000; {convert |fraction| to |scaled|}
denom←take_fraction(gamma,alpha)+
take_fraction(unity+unity+unity-beta,ff);
num←take_fraction(gamma,fraction_three-alpha)+
take_fraction(beta,ff);
end;
if num≥denom+denom+denom+denom then curl_ratio←fraction_four
else curl_ratio←make_fraction(num,denom);
end;
@ We're in the home stretch now.
@<Finish choosing angles and assigning control points@>=
for k←n-1 downto 0 do theta[k]←vv[k]-take_fraction(theta[k+1],uu[k]);
s←p; k←0;
repeat t←link(s);@/
n_sin_cos(theta[k]); st←n_sin; ct←n_cos;@/
n_sin_cos(-psi[k+1]-theta[k+1]); sf←n_sin; cf←n_cos;@/
set_controls(s,t,k);@/
incr(k); s←t;
until k=n
@ The |set_controls| routine actually puts the control points into
a pair of consecutive nodes |p| and~|q|. Global variables are used to
record the values of $\sin\theta$, $\cos\theta$, $\sin\phi$, and
$\cos\phi$ needed in this calculation.
@<Glob...@>=
@!st,@!ct,@!sf,@!cf:fraction; {sines and cosines}
@ @<Declare procedures needed by |make_choices|@>=
procedure set_controls(@!p,@!q:pointer;@!k:integer);
var @!rr,@!ss:fraction; {velocities, divided by thrice the tension}
begin rr←velocity(st,ct,sf,cf,right_tension(p));
ss←velocity(sf,cf,st,ct,left_tension(q));
right_x(p)←x_coord(p)+take_fraction(
take_fraction(delta_x[k],ct)-take_fraction(delta_y[k],st),rr);
right_y(p)←y_coord(p)+take_fraction(
take_fraction(delta_y[k],ct)+take_fraction(delta_x[k],st),rr);
left_x(q)←x_coord(q)-take_fraction(
take_fraction(delta_x[k],cf)-take_fraction(delta_y[k],sf),ss);
right_y(q)←y_coord(q)+take_fraction(
take_fraction(delta_y[k],cf)+take_fraction(delta_x[k],sf),ss);
@<Adjust |right_type(p)| and |left_type(q)|
to be either |explicit| or |smooth|@>;
end;
@ A previous execution of this step may have set |left_type(p)=smooth|,
but only if |right_type(p)=given| and |right_given(p)| pointed to
to the same place as the former |left_given(p)|. Similarly,
in a closed path, |right_type(q)| might already be |smooth|.
@<Adjust |right_type(p)| and |left_type(q)|...@>=
case right_type(p) of
curl: right_type(p)←explicit;
open: right_type(p)←smooth;
given: if (left_type(p)=given)∧(left_given(p)=right_given(p)) then
right_type(p)←smooth
else if left_type(p)=smooth then right_type(p)←smooth
else right_type(p)←explicit;
end; {there should be no other cases}
case left_type(q) of
curl: left_type(q)←explicit;
open,end_cycle: left_type(q)←smooth;
given: if (right_type(q)=given)∧(right_given(q)=left_given(q)) then
left_type(q)←smooth
else if right_type(q)=smooth then left_type(q)←smooth
else left_type(q)←explicit;
end {there should be no other cases}
@ Only the simple cases remain to be handled.
@ @<Reduce to simple case of two givens and |goto done|@>=
begin cosine←make_fraction(delta_x[0],delta[0]);
sine←make_fraction(delta_y[0],delta[0]);
r←right_given(p);@/
ct←take_fraction(x_coord(r),cosine)+take_fraction(y_coord(r),sine);
st←take_fraction(y_coord(r),cosine)-take_fraction(x_coord(r),sine);@/
aa←pyth_add(st,ct); ct←take_fraction(ct,aa); st←take_fraction(st,aa);@/
r←left_given(q);@/
cf←take_fraction(x_coord(r),cosine)+take_fraction(y_coord(r),sine);
sf←-take_fraction(y_coord(r),cosine)+take_fraction(x_coord(r),sine);@/
aa←pyth_add(sf,cf); cf←take_fraction(cf,aa); sf←take_fraction(sf,aa);@/
set_controls(p,q,0); goto done;
end
@ @<Reduce to simple case of straight line and |goto done|@>=
begin right_type(p)←explicit; left_type(q)←explicit;
if right_tension(p)=unity then
begin right_x(p)←x_coord(p)+((delta_x[0]+1) div 3);
right_y(p)←y_coord(p)+((delta_y[0]+1) div 3);
end
else begin ff←make_fraction(unity,3*right_tension(p)); {$\alpha/3$}
right_x(p)←x_coord(p)+take_fraction(delta_x[0],ff);
right_y(p)←y_coord(p)+take_fraction(delta_y[0],ff);
end;
if left_tension(q)=unity then
begin left_x(q)←x_coord(q)-((delta_x[0]+1) div 3);
left_y(q)←y_coord(q)-((delta_y[0]+1) div 3);
end
else begin ff←make_fraction(unity,3*left_tension(q)); {$\beta/3$}
left_x(q)←x_coord(q)-take_fraction(delta_x[0],ff);
left_y(q)←y_coord(q)-take_fraction(delta_y[0],ff);
end;
goto done;
end
@* \[xx] File names.
Besides the fact that different operating systems treat files in different ways,
we must cope with the fact that completely different naming conventions
are used. The following programs show what is required for one particular
operating system; similar routines for other systems are not difficult
to devise.
@↑fingers@>
@↑system dependencies@>
\MF\ assumes that a file name has three parts: the name proper, its
``extension'', and a ``file area'' where it is found in an external file
system. The extension of an input file or a write file is assumed to be
`\.{.MF}' unless otherwise specified; it is `\.{.log}' on the
transcript file that records each run of \MF; it is `\.{.tfm}' on the font
metric files that describe characters in the fonts \MF\ uses; it is
`\.{.dvi}' on the output files that specify typesetting information; and it
is `\.{.pkg}' on the package files written by \.{INIMF} to initialize \MF.
The file area can be arbitrary on input files, but it is usually the
user's current area when a file is output. If an input file cannot be
found on the specified area, \MF\ will look for it on a special system
area; this special area is intended for commonly used input files like
\.{cmbase.MF}.
Simple uses of \MF\ refer only to file names that have no explicit
extension or area. For example, a person usually says `\.{input} \.{cmr10}'
instead of `\.{input} \.{cmr10.new}'. Simple file
names are best, because they make the \MF\ source files portable;
whenever a file name consists entirely of letters and digits, it should be
treated in the same way by all implementations of \MF. However, users
need the ability to refer to other files in their environment, especially
when responding to error messages concerning unopenable files; therefore
we want to let them use the syntax that appears in their favorite
operating system.
@ In order to isolate the system-dependent aspects of file names, the
@↑system dependencies@>
system-independent parts of \MF\ make use of three system-dependent
procedures that are called |begin_name|, |more_name|, and |end_name|. In
essence, if the user-specified characters of the file name are $c_1\ldots c_n$,
the system-independent driver program does the operations
$$|begin_name|;\,|more_name|(c_1);\ldots\,;|more_name|(c_n);
\,|end_name|.$$
These three procedures communicate with each other via global variables.
Afterwards the file name will appear in the string pool as three strings
called |cur_name|\penalty10000\hskip-.05em,
|cur_area|, and |cur_ext|; the latter two are null (i.e.,
|""|), unless they were explicitly specified by the user.
Actually the situation is slightly more complicated, because \MF\ needs
to know when the file name ends. The |more_name| routine is a function
(with side effects) that returns |true| on the calls |more_name|$(c_1)$,
\dots, |more_name|$(c_{n-1})$. The final call |more_name|$(c_n)$
returns |false|; or, it returns |true| and the token following $c_n$ is
something like `\.{begingroup}' (i.e., not a character). In other words,
|more_name| is supposed to return |true| unless it is sure that the
file name has been completely scanned; and |end_name| is supposed to be able
to finish the assembly of |cur_name|, |cur_area|, and |cur_ext| regardless of
whether $|more_name|(c_n)$ returned |true| or |false|.
@<Glob...@>=
@!cur_name:str_number; {name of file just scanned}
@!cur_area:str_number; {file area just scanned, or \.{""}}
@!cur_ext:str_number; {file extension just scanned, or \.{""}}
@ The file names we shall deal with have the following structure:
If the name contains `\.>' or `\.:', the file area consists of all characters
up to and including the final such character; otherwise the file area is null.
If the remaining file name contains `\..', the file extension consists of all
such characters from the first `\..' to the end, otherwise the file extension
is null.
@↑system dependencies@>
We can scan such file names easily by using two global variables that keep track
of the occurrences of area and extension delimiters:
@<Glob...@>=
@!area_delimiter:pool_pointer; {the most recent `\.>' or `\.:', if any}
@!ext_delimiter:pool_pointer; {the relevant `\..', if any}
@ Input files that can't be found in the user's area may appear in a standard
system area called |MF_area|.
This system area name will, of course, vary from place to place.
@↑system dependencies@>
@d MF_area=="MFinputs:"
@.MFinputs@>
@ Here now is the first of the system-dependent routines for file name scanning.
@↑system dependencies@>
@p procedure begin_name;
begin area_delimiter←0; ext_delimiter←0;
end;
@ And here's the second.
@↑system dependencies@>
@p function more_name(@!c:ASCII_code):boolean;
begin if c=" " then more_name←false
else begin if (c=">")∨(c=":") then
begin area_delimiter←pool_ptr; ext_delimiter←0;
end
else if (c=".")∧(ext_delimiter=0) then ext_delimiter←pool_ptr;
str_room(1); append_char(c); {contribute |c| to the current string}
more_name←true;
end;
end;
@ The third.
@↑system dependencies@>
@p procedure end_name;
begin if str_ptr+3>max_strings then
overflow("number of strings",max_strings-init_str_ptr);
@:METAFONT capacity exceeded number of strings}{\quad number of strings@>
if area_delimiter=0 then cur_area←""
else begin cur_area←str_ptr; incr(str_ptr);
str_start[str_ptr]←area_delimiter+1;
end;
if ext_delimiter=0 then
begin cur_ext←""; cur_name←make_string;
end
else begin cur_name←str_ptr; incr(str_ptr);
str_start[str_ptr]←ext_delimiter; cur_ext←make_string;
end;
end;
@ Conversely, here is a routine that takes three strings and prints a file
name that might have produced them. (The routine is system dependent, because
some operating systems put the file area last instead of first.)
@↑system dependencies@>
@<Basic printing...@>=
procedure print_file_name(@!n,@!a,@!e:integer);
begin print(a); print(n); print(e);
end;
@ Another system-dependent routine is needed to convert three \MF\ strings
into the |name_of_file| value that is used to open files. The present code
allows both lowercase and uppercase letters in the file name.
@↑system dependencies@>
@d append_to_name(#)==begin c←#; incr(k);
if k≤file_name_size then name_of_file[k]←xchr[c];
end
@p procedure pack_file_name(@!n,@!a,@!e:str_number);
var k:integer; {number of positions filled in |name_of_file|}
@!c: ASCII_code; {character being packed}
@!j:pool_pointer; {index into |str_pool|}
begin k←0;
for j←str_start[a] to str_start[a+1]-1 do append_to_name(str_pool[j]);
for j←str_start[n] to str_start[n+1]-1 do append_to_name(str_pool[j]);
for j←str_start[e] to str_start[e+1]-1 do append_to_name(str_pool[j]);
if k≤file_name_size then name_length←k@+else name_length←file_name_size;
for k←name_length+1 to file_name_size do name_of_file[k]←' ';
end;
@ A messier routine is also needed, since package file names must be scanned
before \MF's string mechanism has been initialized. We shall use the
global variable |MF_pkg_default| to supply the text for default system areas
and extensions related to package files.
@↑system dependencies@>
@d pkg_default_length=20 {length of the |MF_pkg_default| string}
@d pkg_area_length=11 {length of its area part}
@d pkg_ext_length=4 {length of its `\.{.pkg}' part}
@<Glob...@>=
@!MF_pkg_default:packed array[1..pkg_default_length] of char;
@ @<Set init...@>=
MF_pkg_default←'TeXformats:PLAIN.pkg';
@.TeXformats@>
@.PLAIN@>
@↑system dependencies@>
@ @<Check the ``constant'' values for consistency@>=
if pkg_default_length>file_name_size then bad←31;
@ Here is the messy routine that was just mentioned. It sets |name_of_file|
from the first |n| characters of |MF_pkg_default|, followed by
|buffer[a..b]|, followed by the last |pkg_ext_length| characters of
|MF_pkg_default|.
We dare not give error messages here, since \MF\ calls this routine before
the |error| routine is ready to roll. Instead, we simply drop excess characters,
since the error will be detected in another way when a strange file name
isn't found.
@↑system dependencies@>
@p procedure pack_buffered_name(@!n:small_number;@!a,@!b:integer);
var k:integer; {number of positions filled in |name_of_file|}
@!c: ASCII_code; {character being packed}
@!j:integer; {index into |buffer| or |MF_pkg_default|}
begin if n+b-a+1+pkg_ext_length>file_name_size then
b←a+file_name_size-n-1-pkg_ext_length;
k←0;
for j←1 to n do append_to_name(xord[MF_pkg_default[j]]);
for j←a to b do append_to_name(buffer[j]);
for j←pkg_default_length-pkg_ext_length+1 to pkg_default_length do
append_to_name(xord[MF_pkg_default[j]]);
if k≤file_name_size then name_length←k@+else name_length←file_name_size;
for k←name_length+1 to file_name_size do name_of_file[k]←' ';
end;
@ Here is the only place we use |pack_buffered_name|. This part of the program
becomes active when a ``virgin'' \MF\ is trying to get going, just after
the preliminary initialization, or when the user is substituting another
package file by typing `\.\&' after the initial `\.{**}' prompt. The buffer
contains the first line of input in |buffer[loc..(last-1)]|, where
|loc<last| and |buffer[loc]≠" "|.
@<Declare the function called |open_pkg_file|@>=
function open_pkg_file:boolean;
label found,exit;
var j:0..buf_size; {the first space after the file name}
begin if buffer[loc]="&" then
begin incr(loc); j←loc; buffer[last]←" ";
while buffer[j]≠" " do incr(j);
pack_buffered_name(0,loc,j-1); {try first without the system file area}
if w_open_in(pkg_file) then
begin loc←j; goto found;
end;@/
pack_buffered_name(pkg_area_length,loc,j-1);
{now try the system package file area}
if w_open_in(pkg_file) then
begin loc←j; goto found;
end;
wake_up_terminal;
wterm_ln('Sorry, I can''t find that package;',' will try PLAIN.');
@.Sorry, I can't find...@>
end;
{now pull out all the stops: try for the system \.{PLAIN} file}
pack_buffered_name(pkg_default_length-pkg_ext_length,1,0);
if ¬ w_open_in(pkg_file) then
begin wake_up_terminal;
wterm_ln('I can''t find the PLAIN package file!');
@.I can't find PLAIN...@>
@.PLAIN@>
open_pkg_file←false; return;
end;
found:open_pkg_file←true;
exit:end;
@ Operating systems often make it possible to determine the exact name (and
possible version number) of a file that has been opened. The following routine,
which simply makes a \MF\ string from the value of |name_of_file|, should
ideally be changed to deduce the full name of file~|f|, if it is
possible to do this in a \PASCAL\ program.
@↑system dependencies@>
@p function make_name_string:str_number;
var k:1..file_name_size; {index into |name_of_file|}
begin str_room(name_length);
for k←1 to name_length do append_char(xord[name_of_file[k]]);
make_name_string←make_string;
end;
function a_make_name_string(var f:alpha_file):str_number;
begin a_make_name_string←make_name_string;
end;
function b_make_name_string(var f:byte_file):str_number;
begin b_make_name_string←make_name_string;
end;
function w_make_name_string(var f:word_file):str_number;
begin w_make_name_string←make_name_string;
end;
@ Now let's consider the routines by which \MF\ deals with file names
in a system-independent manner. First comes a procedure that looks for a
file name in the input by calling |get_x_token| for the information.
@p procedure scan_file_name;
label done;
begin name_in_progress←true; begin_name;
@<Get the next non-blank non-call...@>;
loop@+begin if (cur_cmd>other_char)∨(cur_chr>127) then {not a character}
begin back_input; goto done;
end;
if ¬ more_name(cur_chr) then goto done;
get_x_token;
end;
done: end_name; name_in_progress←false;
end;
@ The global variable |name_in_progress| is used to prevent recursive
use of |scan_file_name|, since the |begin_name| and other procedures
communicate via global variables. Recursion would arise only by
devious tricks like `\.{input input f}'; such attempts at sabotage
must be thwarted.
@↑recursion@>
Another global variable, |job_name|, contains the file name that was first
\.{input} by the user. This name is extended by `\.{log}' and `\.{dvi}'
and `\.{pkg}' in order to make the names of \MF's output files.
@<Glob...@>=
@!name_in_progress:boolean; {is a file name being scanned?}
@!job_name:str_number; {principal file name}
@ Initially |job_name=0|; it becomes nonzero as soon as the true name is known.
We have |job_name=0| if and only if the `\.{log}' file has not been opened,
except of course for a short time just after |job_name| has become nonzero.
@<Initialize the output...@>=job_name←0; name_in_progress←false;
@ Here is a routine that manufactures the output file names, assuming that
|job_name≠0|. It ignores and changes the current settings of |cur_area|
and |cur_ext|.
@d pack_cur_name==pack_file_name(cur_name,cur_area,cur_ext)
@p procedure pack_job_name(@!s:str_number); {|s = ".log"|, |".dvi"|, or
|".pkg"|}
begin cur_area←""; cur_ext←s;
cur_name←job_name; pack_cur_name;
end;
@ If some trouble arises when \MF\ tries to open a file, the following
routine calls upon the user to supply another file name. Parameter~|s|
is used in the error message to identify the type of file; parameter~|e|
is the default extension if none is given. Upon exit from the routine,
variables |cur_name|, |cur_area|, |cur_ext|, and |name_of_file| are
ready for another attempt at file opening.
@p procedure prompt_file_name(@!s,@!e:str_number);
label done;
var k:0..buf_size; {index into |buffer|}
begin if interaction=scroll_mode then wake_up_terminal;
if s="input file name" then print_err("I can't find file `")
@.I can't find file x@>
else print_err("I can't write on file `");
@.I can't write on file x@>
print_file_name(cur_name,cur_area,cur_ext); print("'.");
if e=".mf" then show_context;
print_nl("Please type another "); print(s);
@.Please type...@>
if interaction<scroll_mode then
fatal_error("*** (job aborted, file error in nonstop mode)");
@.job aborted, file error...@>
clear_terminal; prompt_input(": "); @<Scan file name in the buffer@>;
if cur_ext="" then cur_ext←e;
pack_cur_name;
end;
@ @<Scan file name in the buffer@>=
begin begin_name; k←first;
while (buffer[k]=" ")∧(k<last) do incr(k);
loop@+ begin if k=last then goto done;
if ¬ more_name(buffer[k]) then goto done;
incr(k);
end;
done:end_name;
end
@ Here's an example of how these conventions are used. We shall use the
macro |ensure_dvi_open| when it is time to ship out a box of stuff.
@d ensure_dvi_open==if output_file_name=0 then
begin if job_name=0 then open_log_file;
pack_job_name(".dvi");
while ¬ b_open_out(dvi_file) do
prompt_file_name("file name for output",".dvi");
output_file_name←b_make_name_string(dvi_file);
end
@<Glob...@>=
@!dvi_file: byte_file; {the device-independent output goes here}
@!output_file_name: str_number; {full name of the output file}
@!log_name: str_number; {full name of the log file}
@ @<Initialize the output...@>=output_file_name←0;
@ The |open_log_file| routine is used to open the transcript file and to help
it catch up to what has previously been printed on the terminal.
@p procedure open_log_file;
var old_setting:0..max_selector; {previous |selector| setting}
@!k:0..buf_size; {index into |months| and |buffer|}
@!l:0..buf_size; {end of first input line}
@!months:packed array [1..36] of char; {abbreviations of month names}
begin old_setting←selector;
if job_name=0 then job_name←"mfput";
pack_job_name(".log");
while ¬ a_open_out(log_file) do @<Try to get a different log file name@>;
log_name←a_make_name_string(log_file);
selector←log_only;
@<Print the banner line, including the date and time@>;
input_stack[input_ptr]←cur_input; {make sure bottom level is in memory}
print_nl("**");
@.**@>
l←input_stack[0].limit_field; {last position of first line}
if buffer[l]=end_line_char then decr(l);
for k←1 to l do print(buffer[k]);
print_ln; {now the transcript file contains the first line of input}
selector←old_setting+2; {|log_only| or |term_and_log|}
end;
@ Sometimes |open_log_file| is called at awkward moments when \MF\ is
unable to print error messages or even to |show_context|.
Therefore the program is careful not to call |prompt_file_name| if a
fatal error could result.
Incidentally, the program always refers to the log file as a `\.{transcript
file}', because some systems cannot use the extension `\.{.log}' for
this file.
@<Try to get a different log file name@>=
begin if interaction<scroll_mode then {bypass |fatal_error|}
begin print_err("I can't write on file `");
@.I can't write on file x@>
print_file_name(cur_name,cur_area,cur_ext); print("'.");@/
job_name←0; history←fatal_error_stop; jump_out;
end; {abort the program without a log file}
prompt_file_name("transcript file name",".log");
end
@ @<Print the banner...@>=
begin wlog(banner);
print(pkg_ident); print(" ");
print_int(day); print_char(" ");
months←'JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC';
for k←3*month-2 to 3*month do wlog(months[k]);
print_char(" "); print_int(year); print_char(" ");
print_two(time div 60); print_char(":"); print_two(time mod 60);
end
@ Let's turn now to the procedure that is used to initiate file reading
when an `\.{input}' command is being processed.
@p procedure start_input; {\MF\ will \.{input} something}
label done;
begin scan_file_name; {set |cur_name| to desired file name}
if cur_ext="" then cur_ext←".mf";
pack_cur_name;
loop@+ begin begin_file_reading; {set up |cur_file| and new level of input}
if a_open_in(cur_file) then goto done;
pack_file_name(cur_name,MF_area,cur_ext);
if a_open_in(cur_file) then goto done;
end_file_reading; {remove the level that didn't work}
prompt_file_name("input file name",".mf");
end;
done: name←a_make_name_string(cur_file);
if job_name=0 then
begin job_name←cur_name; open_log_file;
end; {|open_log_file| doesn't |show_context|, so |limit|
and |loc| needn't be set to meaningful values yet}
if term_offset+length(name)>max_print_line-2 then print_ln
else if (term_offset>0)∨(file_offset>0) then print_char(" ");
print_char("("); print(name); update_terminal; state←new_line;
@<Read the first line of the new file@>;
end;
@ Here we have to remember to tell the |input_ln| routine not to
start with a |get|. If the file is empty, it is considered to
contain a single blank line.
@↑system dependencies@>
@↑empty line at end of file@>
@<Read the first line...@>=
begin if ¬ input_ln(cur_file,false) then do_nothing;
firm_up_the_line;
if (end_line_char<0)∨(end_line_char>127) then decr(limit)
else buffer[limit]←end_line_char;
first←limit+1; loc←start; line←1;
end
@* \[xx] Font metric data.
\MF\ gets its knowledge about fonts from font metric files, also called
\.{TFM} files; the `\.T' in `\.{TFM}' stands for \MF,
but other programs know about them too.
@:TFM files}{\.{TFM} files@>
@↑font metric files@>
The information in a \.{TFM} file appears in a sequence of 8-bit bytes.
Since the number of bytes is always a multiple of 4, we could
also regard the file as a sequence of 32-bit words, but \MF\ uses the
byte interpretation. The format of \.{TFM} files was designed by
Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds
@↑Ramshaw, Lyle Harold@>
of information in a compact but useful form.
@<Glob...@>=
@!tfm_file:byte_file;
@ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit
integers that give the lengths of the various subsequent portions
of the file. These twelve integers are, in order:
$$\vbox{\halign{\hfil#&$\null=\null$#\hfil\cr
|lf|&length of the entire file, in words;\cr
|lh|&length of the header data, in words;\cr
|bc|&smallest character code in the font;\cr
|ec|&largest character code in the font;\cr
|nw|&number of words in the width table;\cr
|nh|&number of words in the height table;\cr
|nd|&number of words in the depth table;\cr
|ni|&number of words in the italic correction table;\cr
|nl|&number of words in the lig/kern table;\cr
|nk|&number of words in the kern table;\cr
|ne|&number of words in the extensible character table;\cr
|np|&number of font parameter words.\cr}}$$
They are all nonnegative and less than $2↑{15}$. We must have |bc-1≤ec≤255|,
and
$$\hbox{|lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np|.}$$
Note that a font may contain as many as 256 characters (if |bc=0| and |ec=255|),
and as few as 0 characters (if |bc=ec+1|). An exception to these rules is
planned for oriental fonts, which will be identified by the condition |ec=256|;
such fonts are not allowed except in extensions to \MF84.
@↑oriental characters@>@↑Chinese characters@>@↑Japanese characters@>
Incidentally, when two or more 8-bit bytes are combined to form an integer of
16 or more bits, the most significant bytes appear first in the file.
This is called BigEndian order.
@!@↑BigEndian order@>
@ The rest of the \.{TFM} file may be regarded as a sequence of ten data
arrays having the informal specification
$$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2}
\vbox{\halign{\hfil\\{#}&$\,:\,$\arr#\hfil\cr
header&|[0..lh-1]@t\\{stuff}@>|\cr
char\_info&|[bc..ec]char_info_word|\cr
width&|[0..nw-1]fix_word|\cr
height&|[0..nh-1]fix_word|\cr
depth&|[0..nd-1]fix_word|\cr
italic&|[0..ni-1]fix_word|\cr
lig\_kern&|[0..nl-1]lig_kern_command|\cr
kern&|[0..nk-1]fix_word|\cr
exten&|[0..ne-1]extensible_recipe|\cr
param&|[1..np]fix_word|\cr}}$$
The most important data type used here is a |@!fix_word|, which is
a 32-bit representation of a binary fraction. A |fix_word| is a signed
quantity, with the two's complement of the entire word used to represent
negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the
binary point; thus, the largest |fix_word| value is $2048-2↑{-20}$, and
the smallest is $-2048$. We will see below, however, that all but two of
the |fix_word| values must lie between $-16$ and $+16$.
@ The first data array is a block of header information, which contains
general facts about the font. The header must contain at least two words,
|header[0]| and |header[1]|, whose meaning is explained below.
Additional header information of use to other software routines might
also be included, but \MF84 does not need to know about such details.
For example, 16 more words of header information are in use at the Xerox
Palo Alto Research Center; the first ten specify the character coding
scheme used (e.g., `\.{XEROX TEXT}' or `\.{TeX MATHSY}'), the next five
give the font family name (e.g., `\.{HELVETICA}' or `\.{CMSY}'), and the
last gives the ``face byte.'' The program that converts \.{DVI} files
to Xerox printing format gets this information by looking at the \.{TFM}
file, which it needs to read anyway because of other information that
is not explicitly repeated in \.{DVI} format.
\yskip\hang|header[0]| is a 32-bit check sum that \MF\ will copy into
the \.{DVI} output file. Later on when the \.{DVI} file is printed,
possibly on another computer, the actual font that gets used is supposed
to have a check sum that agrees with the one in the \.{TFM} file used by
\MF. In this way, users will be warned about potential incompatibilities.
(However, if the check sum is zero in either the font file or the \.{TFM}
file, no check is made.) The actual relation between this check sum and
the rest of the \.{TFM} file is not important; the check sum is simply an
identification number with the property that incompatible fonts almost
always have distinct check sums.
@↑check sum@>
\yskip\hang|header[1]| is a |fix_word| containing the design size of
the font, in units of \MF\ points. This number must be at least 1.0; it is
fairly arbitrary, but usually the design size is 10.0 for a ``10 point''
font, i.e., a font that was designed to look best at a 10-point size,
whatever that really means. When a \MF\ user asks for a font
`\.{at} $\delta$ \.{pt}', the effect is to override the design size
and replace it by $\delta$, and to multiply the $x$ and~$y$ coordinates
of the points in the font image by a factor of $\delta$ divided by the
design size. {\sl All other dimensions in the\/ \.{TFM} file are
|fix_word| numbers in design-size units.} Thus, for example, the value
of |param[6]|, which defines the \.{em} unit, is often the |fix_word| value
$2↑{20}=1.0$, since many fonts have a design size equal to one em.
The other dimensions must be less than 16 design-size units in absolute
value; thus, |header[1]| and |param[1]| are the only |fix_word|
entries in the whole \.{TFM} file whose first byte might be something
besides 0 or 255.
@ Next comes the |char_info| array, which contains one |@!char_info_word|
per character. Each word in this part of the file contains six fields
packed into four bytes as follows.
\yskip\hang first byte: |@!width_index| (8 bits)\par
\hang second byte: |@!height_index| (4 bits) times 16, plus |@!depth_index|
(4~bits)\par
\hang third byte: |@!italic_index| (6 bits) times 4, plus |@!tag|
(2~bits)\par
\hang fourth byte: |@!remainder| (8 bits)\par
\yskip\noindent
The actual width of a character is \\{width}|[width_index]|, in design-size
units; this is a device for compressing information, since many characters
have the same width. Since it is quite common for many characters
to have the same height, depth, or italic correction, the \.{TFM} format
imposes a limit of 16 different heights, 16 different depths, and
64 different italic corrections.
@!@↑italic correction@>
The italic correction of a character has two different uses.
(a)~In ordinary text, the italic correction is added to the width only if
the \TeX\ user specifies `\.{\\/}' after the character.
(b)~In math formulas, the italic correction is always added to the width,
unless the character has a subscript but no superscript.
Incidentally, the relation $\\{width}[0]=\\{height}[0]=\\{depth}[0]=
\\{italic}[0]=0$ should always hold, so that an index of zero implies a
value of zero. The |width_index| should never be zero unless the
character does not exist in the font, since a character is valid if and
only if it lies between |bc| and |ec| and has a nonzero |width_index|.
@ The |tag| field in a |char_info_word| has four values that explain how to
interpret the |remainder| field.
\yskip\hang|tag=0| (|no_tag|) means that |remainder| is unused.\par
\hang|tag=1| (|lig_tag|) means that this character has a ligature/kerning
program starting at |lig_kern[remainder]|.\par
\hang|tag=2| (|list_tag|) means that this character is part of a chain of
characters of ascending sizes, and not the largest in the chain. The
|remainder| field gives the character code of the next larger character.\par
\hang|tag=3| (|ext_tag|) means that this character code represents an
extensible character, i.e., a character that is built up of smaller pieces
so that it can be made arbitrarily large. The pieces are specified in
|@!exten[remainder]|.\par
\yskip\noindent
Characters with |tag=2| and |tag=3| are treated as characters with |tag=0|
unless they are used in special circumstances in math formulas. For example,
\TeX's \.{\\sum} operation looks for a |list_tag|, and the \.{\\left}
operation looks for both |list_tag| and |ext_tag|.
@d no_tag=0 {vanilla character}
@d lig_tag=1 {character has a ligature/kerning program}
@d list_tag=2 {character has a successor in a charlist}
@d ext_tag=3 {character is extensible}
@ The |lig_kern| array contains instructions in a simple programming language
that explains what to do for special letter pairs. Each word in this array is a
|@!lig_kern_command| of four bytes.
\yskip\hang first byte: |stop_bit|, indicates the final program step
if the byte is 128 or more.\par
\hang second byte: |next_char|, ``if |next_char| follows the current character,
then perform the operation and stop, otherwise continue.''\par
\hang third byte: |op_bit|, indicates a ligature step if less than~128,
a kern step otherwise.\par
\hang fourth byte: |remainder|.\par
\yskip\noindent
In a ligature step the current character and |next_char| are replaced by
the single character whose code is |remainder|. In a kern step, an
additional space equal to |@!kern[remainder]| is inserted between the
current character and |next_char|. (The value of |kern[remainder]| is
often negative, so that the characters are brought closer together
by kerning, but it might be positive.)
@d stop_flag=128+min_quarterword
{value indicating `\.{STOP}' in a lig/kern program}
@d kern_flag=128+min_quarterword {op code for a kern step}
@d stop_bit(#)==#.b0
@d next_char(#)==#.b1
@d op_bit(#)==#.b2
@d rem_byte(#)==#.b3
@ Extensible characters are specified by an |@!extensible_recipe|, which
consists of four bytes called |@!top|, |@!mid|, |@!bot|, and |@!rep| (in this
order). These bytes are the character codes of individual pieces used to
build up a large symbol. If |top|, |mid|, or |bot| are zero, they are not
present in the built-up result. For example, an extensible vertical line is
like an extensible bracket, except that the top and bottom pieces are missing.
Let $T$, $M$, $B$, and $R$ denote the respective pieces, or an empty box
if the piece isn't present. Then the extensible characters have the form
$TR↑kMR↑kB$ from top to bottom, for some |k≥0|, unless $M$ is absent;
in the latter case we can have $TR↑kB$ for both even and odd values of~|k|.
The width of the extensible character is the width of $R$; and the
height-plus-depth is the sum of the individual height-plus-depths of the
components used, since the pieces are butted together in a vertical list.
@d ext_top(#)==#.b0 {|top| piece in a recipe}
@d ext_mid(#)==#.b1 {|mid| piece in a recipe}
@d ext_bot(#)==#.b2 {|bot| piece in a recipe}
@d ext_rep(#)==#.b3 {|rep| piece in a recipe}
@ The final portion of a \.{TFM} file is the |param| array, which is another
sequence of |fix_word| values.
\yskip\hang|param[1]=slant| is the amount of italic slant, which is used
to help position accents. For example, |slant=.25| means that when you go
up one unit, you also go .25 units to the right. The |slant| is a pure
number; it is the only |fix_word| other than the design size itself that is
not scaled by the design size.
\hang|param[2]=space| is the normal spacing between words in text.
Note that character @'40 in the font need not have anything to do with
blank spaces.
\hang|param[3]=space_stretch| is the amount of glue stretching between words.
\hang|param[4]=space_shrink| is the amount of glue shrinking between words.
\hang|param[5]=x_height| is the size of one ex in the font; it is also
the height of letters for which accents don't have to be raised or lowered.
\hang|param[6]=quad| is the size of one em in the font.
\hang|param[7]=extra_space| is the amount added to |param[2]| at the
ends of sentences.
\yskip\noindent
If fewer than seven parameters are present, \MF\ sets the missing parameters
to zero. Fonts used for math symbols are required to have
additional parameter information, which is explained later.
@d slant_code=1
@d space_code=2
@d space_stretch_code=3
@d space_shrink_code=4
@d x_height_code=5
@d quad_code=6
@d extra_space_code=7
@ So that is what \.{TFM} files hold. λλλ Now I leave TeX...
@ Here now is the (rather formidable) array of font arrays.
@<Glob...@>=
@!font_info:array[0..font_mem_size] of memory_word;
{the big collection of font data}
@!fmem_ptr:0..font_mem_size; {first unused word of |font_info|}
@!font_ptr:internal_font_number; {largest internal font number in use}
@!font_ident:array[internal_font_number] of pointer; {the most recent user
font identifier corresponding to an internal font number}
@!font_check:array[internal_font_number] of four_quarters; {check sum}
@!font_size:array[internal_font_number] of scaled; {``at'' size}
@!font_dsize:array[internal_font_number] of scaled; {``design'' size}
@!font_params:array[internal_font_number] of halfword; {how many font
parameters are present}
@!font_name:array[internal_font_number] of str_number; {name of the font}
@!font_area:array[internal_font_number] of str_number; {area of the font}
@!font_bc:array[internal_font_number] of eight_bits;
{beginning (smallest) character code}
@!font_ec:array[internal_font_number] of eight_bits;
{ending (largest) character code}
@!font_glue:array[internal_font_number] of pointer;
{glue specification for interword space, |null| if not allocated}
@!font_used:array[internal_font_number] of boolean;
{has a character from this font actually appeared in the output?}
@!hyphen_char:array[internal_font_number] of integer;
{current \.{\\hyphenchar} values}
@!skew_char:array[internal_font_number] of integer;
{current \.{\\skewchar} values}
@ Besides the arrays just enumerated, we have directory arrays that make it
easy to get at the individual entries in |font_info|. For example, the
|char_info| data for character |c| in font |f| will be in
|font_info[char_base[f]+c].qqqq|; and if |w| is the |width_index|
part of this word (the |b0| field), the width of the character is
|font_info[width_base[f]+w].sc|. (These formulas assume that
|min_quarterword| has already been added to |c| and to |w|, since \MF\
stores its quarterwords that way.)
@<Glob...@>=
@!char_base:array[internal_font_number] of integer;
{base addresses for |char_info|}
@!width_base:array[internal_font_number] of integer;
{base addresses for widths}
@!height_base:array[internal_font_number] of integer;
{base addresses for heights}
@!depth_base:array[internal_font_number] of integer;
{base addresses for depths}
@!italic_base:array[internal_font_number] of integer;
{base addresses for italic corrections}
@!lig_kern_base:array[internal_font_number] of integer;
{base addresses for ligature/kerning programs}
@!kern_base:array[internal_font_number] of integer;
{base addresses for kerns}
@!exten_base:array[internal_font_number] of integer;
{base addresses for extensible recipes}
@!param_base:array[internal_font_number] of integer;
{base addresses for font parameters}
@ @<Set init...@>=
for k←font_base to font_max do font_used[k]←false;
@ \MF\ always knows at least one font, namely the null font. It has no
characters, and its seven parameters are all equal to zero.
@<Initialize table...@>=
font_ptr←null_font; fmem_ptr←7;
font_name[null_font]←"nullfont"; font_area[null_font]←"";
hyphen_char[null_font]←"-"; skew_char[null_font]←-1;
font_bc[null_font]←1; font_ec[null_font]←0;
font_size[null_font]←0; font_dsize[null_font]←0;
font_glue[null_font]←null; font_params[null_font]←7;
param_base[null_font]←-1;
font_ident[null_font]←frozen_null_font;
for k←0 to 6 do font_info[k].sc←0;
@ @<Put each...@>=
primitive("nullfont",set_font,null_font);
@!@:null_font_}{\.{\\nullfont} primitive@>
text(frozen_null_font)←"nullfont"; eqtb[frozen_null_font]←eqtb[cur_val];
@ Of course we want to define macros that suppress the detail of how font
information is actually packed, so that we don't have to write things like
$$\hbox{|font_info[width_base[f]+font_info[char_base[f]+c].qqqq.b0].sc|}$$
too often. The \.{WEB} definitions here make |char_info(f)(c)| the
|four_quarters| word of font information corresponding to character
|c| of font |f|. If |q| is such a word, |char_width(f)(q)| will be
the character's width, so that the long formula above is at least
abbreviated to
$$\hbox{|char_width(f)(char_info(f)(c))|.}$$
Usually, of course, we will fetch |q| first and look at several of its
fields at the same time.
The italic correction of a character will be denoted by
|char_italic(f)(q)|, so it is analogous to |char_width|. But we will get
at the height and depth in a slightly different way, since we usually want
to compute both height and depth if we want either one. The value of
|height_depth(q)| will be the 8-bit quantity
$$b=|height_index|\times16+|depth_index|,$$ and if |b| is such a byte we
will write |char_height(f)(b)| and |char_depth(f)(b)| for the height and
depth of the character |c| for which |q=char_info(f)(c)|. Got that?
The tag field will be called |char_tag(q)|; the remainder byte will be
called |rem_byte(q)|, using a macro that we have already defined above.
Access to a character's |height|, |depth|, and |tag| fields is part of
\MF's inner loop, so we want these macros to produce code that is as fast
as possible under the circumstances.
@↑inner loop@>
@d char_info_end(#)==#].qqqq
@d char_info(#)==font_info[char_base[#]+char_info_end
@d char_width_end(#)==#.b0].sc
@d char_width(#)==font_info[width_base[#]+char_width_end
@d char_exists(#)==(#.b0>min_quarterword)
@d char_italic_end(#)==(qo(#.b2)) div 4].sc
@d char_italic(#)==font_info[italic_base[#]+char_italic_end
@d height_depth(#)==qo(#.b1)
@d char_height_end(#)==(#) div 16].sc
@d char_height(#)==font_info[height_base[#]+char_height_end
@d char_depth_end(#)==# mod 16].sc
@d char_depth(#)==font_info[depth_base[#]+char_depth_end
@d char_tag(#)==((qo(#.b2)) mod 4)
@ The global variable |null_character| is set up to be a word of
|char_info| for a character that doesn't exist. Such a word provides a
convenient way to deal with erroneous situations.
@<Glob...@>=
@!null_character:four_quarters; {nonexistent character information}
@ @<Set init...@>=
null_character.b0←min_quarterword; null_character.b1←min_quarterword;
null_character.b2←min_quarterword; null_character.b3←min_quarterword;
@ Here are some macros that help process ligatures and kerns.
We write |char_kern(f)(j)| to find the amount of kerning specified by
kerning command |j|.
@d lig_kern_start(#)==lig_kern_base[#]+rem_byte {beginning of lig/kern program}
@d char_kern_end(#)==rem_byte(#)].sc
@d char_kern(#)==font_info[kern_base[#]+char_kern_end
@ Font parameters are referred to as |slant(f)|, |space(f)|, etc.
@d param_end(#)==param_base[#]].sc
@d param(#)==font_info[#+param_end
@d slant==param(slant_code) {slant to the right, per unit distance upward}
@d space==param(space_code) {normal space between words}
@d space_stretch==param(space_stretch_code) {stretch between words}
@d space_shrink==param(space_shrink_code) {shrink between words}
@d x_height==param(x_height_code) {one ex}
@d quad==param(quad_code) {one em}
@d extra_space==param(extra_space_code) {additional space at end of sentence}
@<The em width for |cur_font|@>=quad(cur_font)
@ @<The x-height for |cur_font|@>=x_height(cur_font)
@ \MF\ checks the information of a \.{TFM} file for validity as the
file is being read in, so that no further checks will be needed when
typesetting is going on. The somewhat tedious subroutine that does this
is called |read_font_info|. It has four parameters: the user font
identifier~|u|, the file name and area strings |nom| and |aire|, and the
``at'' size~|s|. If |s|~is negative, its the negative of a scale factor
to be applied to the design size; |s=-1000| is the normal case.
Otherwise |s| will be substituted for the design size; in this
case, |s| must be positive and less than $2048\rm\,pt$
(i.e., it must be less than $2↑{27}$ when considered as an integer).
The subroutine opens and closes a global file variable called |tfm_file|.
It returns the value of the internal font number that was just loaded.
If an error is detected, an error message is issued and no font
information is stored; |null_font| is returned in this case.
@d bad_tfm=11 {label for |read_font_info|}
@d abort==goto bad_tfm {do this when the \.{TFM} data is wrong}
@p function read_font_info(@!u:pointer;@!nom,@!aire:str_number;
@!s:scaled):internal_font_number; {input a \.{TFM} file}
label done,bad_tfm,not_found;
var k:0..font_mem_size; {index into |font_info|}
@!file_opened:boolean; {was |tfm_file| successfully opened?}
@!lf,@!lh,@!bc,@!ec,@!nw,@!nh,@!nd,@!ni,@!nl,@!nk,@!ne,@!np:halfword;
{sizes of subfiles}
@!f:internal_font_number; {the new font's number}
@!g:internal_font_number; {the number to return}
@!a,@!b,@!c,@!d:eight_bits; {byte variables}
@!qw:four_quarters;@!sw:scaled; {accumulators}
@!z:scaled; {the design size or the ``at'' size}
@!alpha:integer;@!beta:1..16;
{auxiliary quantities used in fixed-point multiplication}
begin g←null_font;@/
@<Read and check the font data; |abort| if the \.{TFM} file is
malformed; if there's no room for this font, say so and |goto
done|; otherwise |incr(font_ptr)| and |goto done|@>;
bad_tfm: @<Report that the font won't be loaded@>;
done: b_close(tfm_file); read_font_info←g;
end;
@ There are programs called \.{TFtoPL} and \.{PLtoTF} that convert
between the \.{TFM} format and a symbolic property-list format
that can be easily edited. These programs contain extensive
diagnostic information, so \MF\ does not have to bother giving
precise details about why it rejects a particular \.{TFM} file.
@.TFtoPL@> @.PLtoTF@>
@d start_font_error_message==print_err("Font "); sprint_cs(u);
print_char("="); print_file_name(nom,aire,"");
if s≥0 then
begin print(" at "); print_scaled(s); print("pt");
end
else if s≠-1000 then
begin print(" scaled "); print_int(-s);
end
@<Report that the font won't be loaded@>=
start_font_error_message;
@.Font x=xx not loadable...@>
if file_opened then print(" not loadable: Bad metric (TFM) file")
else print(" not loadable: Metric (TFM) file not found");
help5("I wasn't able to read the size data for this font,")@/
("so I will ignore the font specification.")@/
("[Wizards can fix TFM files using TFtoPL/PLtoTF.]")@/
("You might try inserting a different font spec;")@/
("e.g., type `I\font<same font id>=<substitute font name>'.");
error
@ @<Read and check...@>=
@<Open |tfm_file| for input@>;
@<Read the {\.{TFM}} size fields@>;
@<Use size fields to allocate font information@>;
@<Read the {\.{TFM}} header@>;
@<Read character data@>;
@<Read box dimensions@>;
@<Read ligature/kern program@>;
@<Read extensible character recipes@>;
@<Read font parameters@>;
@<Make final adjustments and |goto done|@>
@ @<Open |tfm_file| for input@>=
file_opened←false;
if aire="" then pack_file_name(nom,MF_font_area,".tfm")
else pack_file_name(nom,aire,".tfm");
if not b_open_in(tfm_file) then abort;
file_opened←true
@ Note: A malformed \.{TFM} file might be shorter than it claims to be;
thus |eof(tfm_file)| might be true when |read_font_info| refers to
|tfm_file↑| or when it says |get(tfm_file)|. If such circumstances
cause system error messages, you will have to defeat them somehow,
for example by defining |fget| to be `\ignorespaces|begin get(tfm_file);|
|if eof(tfm_file) then abort; end|\unskip'.
@↑system dependencies@>
@d fget==get(tfm_file)
@d fbyte==tfm_file↑
@d read_sixteen(#)==begin #←fbyte;
if #>127 then abort;
fget; #←#*@'400+fbyte;
end
@d store_four_quarters(#)==begin fget; a←fbyte; qw.b0←qi(a);
fget; b←fbyte; qw.b1←qi(b);
fget; c←fbyte; qw.b2←qi(c);
fget; d←fbyte; qw.b3←qi(d);
#←qw;
end
@ @<Read the {\.{TFM}} size fields@>=
begin read_sixteen(lf);
fget; read_sixteen(lh);
fget; read_sixteen(bc);
fget; read_sixteen(ec);
if (bc>ec+1)∨(ec>255) then abort;
fget; read_sixteen(nw);
fget; read_sixteen(nh);
fget; read_sixteen(nd);
fget; read_sixteen(ni);
fget; read_sixteen(nl);
fget; read_sixteen(nk);
fget; read_sixteen(ne);
fget; read_sixteen(np);
if lf≠6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np then abort;
end
@ The preliminary settings of the index variables |char_base|,
|width_base|, |lig_kern_base|, |kern_base|, and |exten_base| will be
corrected later by subtracting |min_quarterword| from them; and we will
subtract 1 from |param_base| too. It's best to forget about such anomalies
until later.
@<Use size fields to allocate font information@>=
lf←lf-6-lh; {|lf| words should be loaded into |font_info|}
if np<7 then lf←lf+7-np; {at least seven parameters will appear}
if (font_ptr=font_max)∨(fmem_ptr+lf>font_mem_size) then
@<Apologize for not loading the font, |goto done|@>;
f←font_ptr+1;
char_base[f]←fmem_ptr-bc;
width_base[f]←char_base[f]+ec+1;
height_base[f]←width_base[f]+nw;
depth_base[f]←height_base[f]+nh;
italic_base[f]←depth_base[f]+nd;
lig_kern_base[f]←italic_base[f]+ni;
kern_base[f]←lig_kern_base[f]+nl;
exten_base[f]←kern_base[f]+nk;
param_base[f]←exten_base[f]+ne
@ @<Apologize for not loading...@>=
begin start_font_error_message;
print(" not loaded: Not enough room left");
@.Font x=xx not loaded...@>
help4("I'm afraid I won't be able to make use of this font,")@/
("because my memory for character-size data is too small.")@/
("If you're really stuck, ask a wizard to enlarge me.")@/
("Or maybe try `I\font<same font id>=<name of loaded font>'.");
error; goto done;
end
@ Only the first two words of the header are needed by \MF84.
@<Read the {\.{TFM}} header@>=
begin if lh<2 then abort;
store_four_quarters(font_check[f]);
fget; read_sixteen(z); {this rejects a negative design size}
fget; z←z*@'400+fbyte; fget; z←(z*@'20)+(fbyte div@'20);
if z<unity then abort;
while lh>2 do
begin fget;fget;fget;fget;decr(lh); {ignore the rest of the header}
end;
font_dsize[f]←z;
if s≠-1000 then
if s≥0 then z←s
else z←xn_over_d(z,-s,1000);
font_size[f]←z;
end
@ @<Read character data@>=
for k←fmem_ptr to width_base[f]-1 do
begin store_four_quarters(font_info[k].qqqq);
if (a≥nw)∨(b div @'20≥nh)∨(b mod @'20≥nd)∨
(c div 4≥ni) then abort;
case c mod 4 of
lig_tag: if d≥nl then abort;
ext_tag: if d≥ne then abort;
list_tag: @<Check for charlist cycle@>;
othercases do_nothing {|no_tag|}
endcases;
end
@ We want to make sure that there is no cycle of characters linked together
by |list_tag| entries, since such a cycle would get \MF\ into an endless
loop. If such a cycle exists, the routine here detects it when processing
the largest character code in the cycle.
@d check_byte_range(#)==begin if (#<bc)∨(#>ec) then abort@+end
@d current_character_being_worked_on==k+bc-fmem_ptr
@<Check for charlist cycle@>=
begin check_byte_range(d);
while d<current_character_being_worked_on do
begin qw←char_info(f)(d);
{N.B.: not |qi(d)|, since |char_base[f]| hasn't been adjusted yet}
if char_tag(qw)≠list_tag then goto not_found;
d←qo(rem_byte(qw)); {next character on the list}
end;
if d=current_character_being_worked_on then abort; {yes, there's a cycle}
not_found:end
@ A |fix_word| whose four bytes are $(a,b,c,d)$ from left to right represents
the number
$$x=\left\{\vcenter{\halign{$#$,\hfil\qquad&if $#$\hfil\cr
b\cdot2↑{-4}+c\cdot2↑{-12}+d\cdot2↑{-20}&a=0;\cr
-16+b\cdot2↑{-4}+c\cdot2↑{-12}+d\cdot2↑{-20}&a=255.\cr}}\right.$$
(No other choices of |a| are allowed, since the magnitude of a number in
design-size units must be less than 16.) We want to multiply this
quantity by the integer~|z|, which is known to be less than $2↑{27}$. Let
$\alpha=16z$. If $|z|<2↑{23}$, the individual multiplications $b\cdot z$,
$c\cdot z$, $d\cdot z$ cannot overflow; otherwise we will divide |z| by 2,
4, 8, or 16, to obtain a multiplier less than $2↑{23}$, and we can
compensate for this later. If |z| has thereby been replaced by
$|z|↑\prime=|z|/2↑e$, let $\beta=2↑{4-e}$; we shall compute
$$\lfloor(b+c\cdot2↑{-8}+d\cdot2↑{-16})\,z↑\prime/\beta\rfloor$$
if $a=0$, or the same quantity minus $\alpha$ if $a=255$.
This calculation must be done exactly, in order to guarantee portability
of \MF\ between computers.
@d store_scaled(#)==begin fget; a←fbyte; fget; b←fbyte;
fget; c←fbyte; fget; d←fbyte;@/
sw←(((((d*z)div@'400)+(c*z))div@'400)+(b*z))div beta;
if a=0 then #←sw@+else if a=255 then #←sw-alpha@+else abort;
end
@<Read box dimensions@>=
begin @<Replace |z| by $|z|↑\prime$ and compute $\alpha,\beta$@>;
for k←width_base[f] to lig_kern_base[f]-1 do
store_scaled(font_info[k].sc);
if font_info[width_base[f]].sc≠0 then abort; {\\{width}[0] must be zero}
if font_info[height_base[f]].sc≠0 then abort; {\\{height}[0] must be zero}
if font_info[depth_base[f]].sc≠0 then abort; {\\{depth}[0] must be zero}
if font_info[italic_base[f]].sc≠0 then abort; {\\{italic}[0] must be zero}
end
@ @<Replace |z|...@>=
begin alpha←16*z; beta←16;
while z≥@'40000000 do
begin z←half(z); beta←half(beta);
end;
end
@ @<Read ligature/kern program@>=
begin for k←lig_kern_base[f] to kern_base[f]-1 do
begin store_four_quarters(font_info[k].qqqq);
check_byte_range(b);
if c<128 then check_byte_range(d) {check ligature}
else if d≥nk then abort; {check kern}
end;
if (nl>0)∧(a<128) then abort; {check for stop bit on last command}
for k←kern_base[f] to exten_base[f]-1 do
store_scaled(font_info[k].sc);
end
@ @<Read extensible character recipes@>=
for k←exten_base[f] to param_base[f]-1 do
begin store_four_quarters(font_info[k].qqqq);
if a≠0 then check_byte_range(a);
if b≠0 then check_byte_range(b);
if c≠0 then check_byte_range(c);
check_byte_range(d);
end
@ We check to see that the \.{TFM} file doesn't end prematurely; but
no error message is given for files having more than |lf| words.
@<Read font parameters@>=
begin for k←1 to np do
if k=1 then {the |slant| parameter is a pure number}
begin fget; sw←fbyte; if sw>127 then sw←sw-256;
fget; sw←sw*@'400+fbyte; fget; sw←sw*@'400+fbyte;
fget; font_info[param_base[f]].sc←
(sw*@'20)+(fbyte div@'20);
end
else store_scaled(font_info[param_base[f]+k-1].sc);
if eof(tfm_file) then abort;
for k←np+1 to 7 do font_info[param_base[f]+k-1].sc←0;
end
@ Now to wrap it up, we have checked all the necessary things about the \.{TFM}
file, and all we need to do is put the finishing touches on the data for
the new font.
@d adjust(#)==#[f]←qo(#[f])
{correct for the excess |min_quarterword| that was added}
@<Make final adjustments...@>=
if np≥7 then font_params[f]←np@+else font_params[f]←7;
hyphen_char[f]←default_hyphen_char; skew_char[f]←default_skew_char;
font_name[f]←nom;
font_area[f]←aire;
font_bc[f]←bc; font_ec[f]←ec; font_glue[f]←null;
adjust(char_base); adjust(width_base); adjust(lig_kern_base);
adjust(kern_base); adjust(exten_base);
decr(param_base[f]);
fmem_ptr←fmem_ptr+lf; font_ptr←f; g←f; goto done
@ Before we forget about the format of these tables, let's deal with two
of \MF's basic scanning routines related to font information.
@<Declare procedures that scan font-related stuff@>=
procedure scan_font_ident;
var f:internal_font_number;
@!m:halfword;
begin @<Get the next non-blank non-call...@>;
if cur_cmd=def_font then f←cur_font
else if cur_cmd=set_font then f←cur_chr
else if cur_cmd=def_family then
begin m←cur_chr; scan_four_bit_int; f←equiv(m+cur_val);
end
else begin print_err("Missing font identifier");
@.Undefined font code@>
help2("I was looking for a control sequence whose")@/
("current meaning has been defined by \font.");
back_error; f←null_font;
end;
cur_val←f;
end;
@ The following routine is used to implement `\.{\\fontdimen} |n| |f|'.
The boolean parameter |writing| is set |true| if the calling program
intends to change the parameter value.
@<Declare procedures that scan font-related stuff@>=
procedure find_font_dimen(@!writing:boolean);
{sets |cur_val| to |font_info| location}
var f:internal_font_number;
@!n:integer; {the parameter number}
begin scan_int; n←cur_val; scan_font_ident; f←cur_val;
if n≤0 then cur_val←fmem_ptr
else begin if writing ∧(n≤space_shrink_code)∧@|
(n≥space_code)∧(font_glue[f]≠null) then
begin delete_glue_ref(font_glue[f]);
font_glue[f]←null;
end;
if n>font_params[f] then
if f<font_ptr then cur_val←fmem_ptr
else @<Increase the number of parameters in the last font@>
else cur_val←n+param_base[f];
end;
@<Issue an error message if |cur_val=fmem_ptr|@>;
end;
@ @<Issue an error message if |cur_val=fmem_ptr|@>=
if cur_val=fmem_ptr then
begin print_err("Font "); sprint_cs(font_ident[f]);
print(" has only "); print_int(font_params[f]);
print(" fontdimen parameters");
@.Font x has only...@>
help2("To increase the number of font parameters, you must")@/
("use \fontdimen immediately after the \font is loaded.");
error;
end
@ @<Increase the number of parameters...@>=
begin repeat if fmem_ptr=font_mem_size then
overflow("font memory",font_mem_size);
@:METAFONT capacity exceeded font memory}{\quad font memory@>
font_info[fmem_ptr].sc←0; incr(fmem_ptr); incr(font_params[f]);
until n=font_params[f];
cur_val←fmem_ptr-1; {this equals |param_base[f]+font_params[f]|}
end
@ When \MF\ wants to typeset a character that doesn't exist, the
character node is not created; thus the output routine can assume
that characters exist when it sees them. The following procedure
prints a warning message unless the user has suppressed it.
@p procedure char_warning(@!f:internal_font_number;@!c:eight_bits);
begin if tracing_lost_chars>0 then
begin begin_diagnostic;
print_nl("Missing character: There is no ");
@.Missing character@>
print_ASCII(c); print(" in font ");
print(font_name[f]); print_char("!"); end_diagnostic(false);
end;
end;
@ Here is a function that returns a pointer to a character node for a
given character in a given font. If that character doesn't exist,
|null| is returned instead.
@p function new_character(@!f:internal_font_number;@!c:eight_bits):pointer;
label exit;
var p:pointer; {newly allocated node}
begin if (font_bc[f]≤c)∧(font_ec[f]≥c) then
if char_exists(char_info(f)(qi(c))) then
begin p←get_avail; font(p)←f; character(p)←qi(c);
new_character←p; return;
end;
char_warning(f,c);
new_character←null;
exit:end;
@* \[xx] Device-independent file format.
The most important output produced by a run of \MF\ is the ``device
independent'' (\.{DVI}) file that specifies where characters and rules
are to appear on printed pages. The form of these files was designed by
David R. Fuchs in 1979. Almost any reasonable device can be driven by
@↑Fuchs, David Raymond@>
@:DVI_files}{\.{DVI} files@>
a program that takes \.{DVI} files as input, and dozens of such
\.{DVI}-to-whatever programs have been written. Thus, it is possible to
print the output of \MF\ on many different kinds of equipment, using \MF\
as a device-independent ``front end.''
A \.{DVI} file is a stream of 8-bit bytes, which may be regarded as a
series of commands in a machine-like language. The first byte of each command
is the operation code, and this code is followed by zero or more bytes
that provide parameters to the command. The parameters themselves may consist
of several consecutive bytes; for example, the `|set_rule|' command has two
parameters, each of which is four bytes long. Parameters are usually
regarded as nonnegative integers; but four-byte-long parameters,
and shorter parameters that denote distances, can be
either positive or negative. Such parameters are given in two's complement
notation. For example, a two-byte-long distance parameter has a value between
$-2↑{15}$ and $2↑{15}-1$.
A \.{DVI} file consists of a ``preamble,'' followed by a sequence of one
or more ``pages,'' followed by a ``postamble.'' The preamble is simply a
|pre| command, with its parameters that define the dimensions used in the
file; this must come first. Each ``page'' consists of a |bop| command,
followed by any number of other commands that tell where characters are to
be placed on a physical page, followed by an |eop| command. The pages
appear in the order that \MF\ generated them. If we ignore |nop| commands
and \\{fnt\_def} commands (which are allowed between any two commands in
the file), each |eop| command is immediately followed by a |bop| command,
or by a |post| command; in the latter case, there are no more pages in the
file, and the remaining bytes form the postamble. Further details about
the postamble will be explained later.
Some parameters in \.{DVI} commands are ``pointers.'' These are four-byte
quantities that give the location number of some other byte in the file;
the first byte is number~0, then comes number~1, and so on. For example,
one of the parameters of a |bop| command points to the previous |bop|;
this makes it feasible to read the pages in backwards order, in case the
results are being directed to a device that stacks its output face up.
Suppose the preamble of a \.{DVI} file occupies bytes 0 to 99. Now if the
first page occupies bytes 100 to 999, say, and if the second
page occupies bytes 1000 to 1999, then the |bop| that starts in byte 1000
points to 100 and the |bop| that starts in byte 2000 points to 1000. (The
very first |bop|, i.e., the one that starts in byte 100, has a pointer of $-1$.)
@ The \.{DVI} format is intended to be both compact and easily interpreted
by a machine. Compactness is achieved by making most of the information
implicit instead of explicit; when a \.{DVI}-reading program reads the
commands for a page, it keeps track of several quantities: (a)~The current
font |f| is an integer; this value is changed only
by \\{fnt} and \\{fnt\_num} commands. (b)~The current position on the page
is given by two numbers called the horizontal and vertical coordinates,
|h| and |v|. Both coordinates are zero at the upper left corner of the page;
moving to the right corresponds to increasing the horizontal coordinate, and
moving down corresponds to increasing the vertical coordinate. Thus, the
coordinates are essentially Cartesian, except that vertical directions are
flipped; the Cartesian version of |(h,v)| would be |(h,-v)|. (c)~The
current spacing amounts are given by four numbers |w|, |x|, |y|, and |z|,
where |w| and~|x| are used for horizontal spacing and where |y| and~|z|
are used for vertical spacing. (d)~There is a stack containing
|(h,v,w,x,y,z)| values; the \.{DVI} commands |push| and |pop| are used to
change the current level of operation. Note that the current font~|f| is
not pushed and popped; the stack contains only information about
positioning.
The values of |h|, |v|, |w|, |x|, |y|, and |z| are signed integers having up
to 32 bits, including the sign. Since they represent physical distances,
there is a small unit of measurement such that increasing |h| by~1 means
moving a certain tiny distance to the right. The actual unit of
measurement is variable, as explained below; \MF\ sets things up so that
its \.{DVI} output is in sp units, i.e., scaled points, in agreement with
all the |scaled| dimensions in \MF's data structures.
@ Here is list of all the commands that may appear in a \.{DVI} file. With
each command we give its symbolic name (e.g., |bop|), its opcode byte
(e.g., 139), and its parameters (if any). The parameters are followed
by a bracketed number telling how many bytes they occupy; for example,
`|p[4]|' means that parameter |p| is four bytes long.
\yskip\hang|set_char_0| 0. Typeset character number~0 from font~|f|
such that the reference point of the character is at |(h,v)|. Then
increase |h| by the width of that character. Note that a character may
have zero or negative width, so one cannot be sure that |h| will advance
after this command; but |h| usually does increase.
\yskip\hang\\{set\_char\_1} through \\{set\_char\_127} (opcodes 1 to 127).
Do the operations of |set_char_0|, but use the appropriate character number
instead of character~0.
\yskip\hang|set1| 128 |c[1]|. Same as |set_char_0|, except that character
number~|c| is typeset. \MF84 uses this command for characters in the
range |128≤c<256|.
\yskip\hang|@!set2| 129 |c[2]|. Same as |set1|, except that |c|~is two
bytes long, so it is in the range |0≤c<65536|. \MF84 never uses this
command, but it should come in handy for extensions of \MF\ that deal
with oriental languages.
@↑oriental characters@>@↑Chinese characters@>@↑Japanese characters@>
\yskip\hang|@!set3| 130 |c[3]|. Same as |set1|, except that |c|~is three
bytes long, so it can be as large as $2↑{24}-1$. Not even the Chinese
language has this many characters, but this command might prove useful
in some yet unforeseen extension.
\yskip\hang|@!set4| 131 |c[4]|. Same as |set1|, except that |c|~is four
bytes long. Imagine that.
\yskip\hang|set_rule| 132 |a[4]| |b[4]|. Typeset a solid black rectangle
of height~|a| and width~|b|, with its bottom left corner at |(h,v)|. Then
set |h←h+b|. If either |a≤0| or |b≤0|, nothing should be typeset. Note
that if |b<0|, the value of |h| will decrease even though nothing else happens.
See below for details about how to typeset rules so that consistency with
\MF\ is guaranteed.
\yskip\hang|put1| 133 |c[1]|. Typeset character number~|c| from font~|f|
such that the reference point of the character is at |(h,v)|. (The `put'
commands are exactly like the `set' commands, except that they simply put out a
character or a rule without moving the reference point afterwards.)
\yskip\hang|@!put2| 134 |c[2]|. Same as |set2|, except that |h| is not changed.
\yskip\hang|@!put3| 135 |c[3]|. Same as |set3|, except that |h| is not changed.
\yskip\hang|@!put4| 136 |c[4]|. Same as |set4|, except that |h| is not changed.
\yskip\hang|put_rule| 137 |a[4]| |b[4]|. Same as |set_rule|, except that
|h| is not changed.
\yskip\hang|nop| 138. No operation, do nothing. Any number of |nop|'s
may occur between \.{DVI} commands, but a |nop| cannot be inserted between
a command and its parameters or between two parameters.
\yskip\hang|bop| 139 $c_0[4]$ $c_1[4]$ $\ldots$ $c_9[4]$ $p[4]$. Beginning
of a page: Set |(h,v,w,x,y,z)←(0,0,0,0,0,0)| and set the stack empty. Set
the current font |f| to an undefined value. The ten $c_i$ parameters hold
the values of \.{\\count0} $\ldots$ \.{\\count9} in \TeX\ λλλ at the time
\.{\\shipout} was invoked for this page; they can be used to identify
pages, if a user wants to print only part of a \.{DVI} file. The parameter
|p| points to the previous |bop| command in the file, where the first
|bop| has $p=-1$.
\yskip\hang|eop| 140. End of page: Print what you have read since the
previous |bop|. At this point the stack should be empty. (The \.{DVI}-reading
programs that drive most output devices will have kept a buffer of the
material that appears on the page that has just ended. This material is
largely, but not entirely, in order by |v| coordinate and (for fixed |v|) by
|h|~coordinate; so it usually needs to be sorted into some order that is
appropriate for the device in question.)
\yskip\hang|push| 141. Push the current values of |(h,v,w,x,y,z)| onto the
top of the stack; do not change any of these values. Note that |f| is
not pushed.
\yskip\hang|pop| 142. Pop the top six values off of the stack and assign
them respectively to |(h,v,w,x,y,z)|. The number of pops should never
exceed the number of pushes, since it would be highly embarrassing if the
stack were empty at the time of a |pop| command.
\yskip\hang|right1| 143 |b[1]|. Set |h←h+b|, i.e., move right |b| units.
The parameter is a signed number in two's complement notation, |-128≤b<128|;
if |b<0|, the reference point actually moves left.
\yskip\hang|right2| 144 |b[2]|. Same as |right1|, except that |b| is a
two-byte quantity in the range |-32768≤b<32768|.
\yskip\hang|right3| 145 |b[3]|. Same as |right1|, except that |b| is a
three-byte quantity in the range |@t$-2↑{23}$@>≤b<@t$2↑{23}$@>|.
\yskip\hang|right4| 146 |b[4]|. Same as |right1|, except that |b| is a
four-byte quantity in the range |@t$-2↑{31}$@>≤b<@t$2↑{31}$@>|.
\yskip\hang|w0| 147. Set |h←h+w|; i.e., move right |w| units. With luck,
this parameterless command will usually suffice, because the same kind of motion
will occur several times in succession; the following commands explain how
|w| gets particular values.
\yskip\hang|w1| 148 |b[1]|. Set |w←b| and |h←h+b|. The value of |b| is a
signed quantity in two's complement notation, |-128≤b<128|. This command
changes the current |w|~spacing and moves right by |b|.
\yskip\hang|@!w2| 149 |b[2]|. Same as |w1|, but |b| is two bytes long,
|-32768≤b<32768|.
\yskip\hang|@!w3| 150 |b[3]|. Same as |w1|, but |b| is three bytes long,
|@t$-2↑{23}$@>≤b<@t$2↑{23}$@>|.
\yskip\hang|@!w4| 151 |b[4]|. Same as |w1|, but |b| is four bytes long,
|@t$-2↑{31}$@>≤b<@t$2↑{31}$@>|.
\yskip\hang|x0| 152. Set |h←h+x|; i.e., move right |x| units. The `|x|'
commands are like the `|w|' commands except that they involve |x| instead
of |w|.
\yskip\hang|x1| 153 |b[1]|. Set |x←b| and |h←h+b|. The value of |b| is a
signed quantity in two's complement notation, |-128≤b<128|. This command
changes the current |x|~spacing and moves right by |b|.
\yskip\hang|@!x2| 154 |b[2]|. Same as |x1|, but |b| is two bytes long,
|-32768≤b<32768|.
\yskip\hang|@!x3| 155 |b[3]|. Same as |x1|, but |b| is three bytes long,
|@t$-2↑{23}$@>≤b<@t$2↑{23}$@>|.
\yskip\hang|@!x4| 156 |b[4]|. Same as |x1|, but |b| is four bytes long,
|@t$-2↑{31}$@>≤b<@t$2↑{31}$@>|.
\yskip\hang|down1| 157 |a[1]|. Set |v←v+a|, i.e., move down |a| units.
The parameter is a signed number in two's complement notation, |-128≤a<128|;
if |a<0|, the reference point actually moves up.
\yskip\hang|@!down2| 158 |a[2]|. Same as |down1|, except that |a| is a
two-byte quantity in the range |-32768≤a<32768|.
\yskip\hang|@!down3| 159 |a[3]|. Same as |down1|, except that |a| is a
three-byte quantity in the range |@t$-2↑{23}$@>≤a<@t$2↑{23}$@>|.
\yskip\hang|@!down4| 160 |a[4]|. Same as |down1|, except that |a| is a
four-byte quantity in the range |@t$-2↑{31}$@>≤a<@t$2↑{31}$@>|.
\yskip\hang|y0| 161. Set |v←v+y|; i.e., move down |y| units. With luck,
this parameterless command will usually suffice, because the same kind of motion
will occur several times in succession; the following commands explain how
|y| gets particular values.
\yskip\hang|y1| 162 |a[1]|. Set |y←a| and |v←v+a|. The value of |a| is a
signed quantity in two's complement notation, |-128≤a<128|. This command
changes the current |y|~spacing and moves down by |a|.
\yskip\hang|@!y2| 163 |a[2]|. Same as |y1|, but |a| is two bytes long,
|-32768≤a<32768|.
\yskip\hang|@!y3| 164 |a[3]|. Same as |y1|, but |a| is three bytes long,
|@t$-2↑{23}$@>≤a<@t$2↑{23}$@>|.
\yskip\hang|@!y4| 165 |a[4]|. Same as |y1|, but |a| is four bytes long,
|@t$-2↑{31}$@>≤a<@t$2↑{31}$@>|.
\yskip\hang|z0| 166. Set |v←v+z|; i.e., move down |z| units. The `|z|' commands
are like the `|y|' commands except that they involve |z| instead of |y|.
\yskip\hang|z1| 167 |a[1]|. Set |z←a| and |v←v+a|. The value of |a| is a
signed quantity in two's complement notation, |-128≤a<128|. This command
changes the current |z|~spacing and moves down by |a|.
\yskip\hang|@!z2| 168 |a[2]|. Same as |z1|, but |a| is two bytes long,
|-32768≤a<32768|.
\yskip\hang|@!z3| 169 |a[3]|. Same as |z1|, but |a| is three bytes long,
|@t$-2↑{23}$@>≤a<@t$2↑{23}$@>|.
\yskip\hang|@!z4| 170 |a[4]|. Same as |z1|, but |a| is four bytes long,
|@t$-2↑{31}$@>≤a<@t$2↑{31}$@>|.
\yskip\hang|fnt_num_0| 171. Set |f←0|. Font 0 must previously have been
defined by a \\{fnt\_def} instruction, as explained below.
\yskip\hang\\{fnt\_num\_1} through \\{fnt\_num\_63} (opcodes 172 to 234). Set
|f←1|, \dots, \hbox{|f←63|}, respectively.
\yskip\hang|fnt1| 235 |k[1]|. Set |f←k|. \MF84 uses this command for font
numbers in the range |64≤k<256|.
\yskip\hang|@!fnt2| 236 |k[2]|. Same as |fnt1|, except that |k|~is two
bytes long, so it is in the range |0≤k<65536|. \MF84 never generates this
command, but large font numbers may prove useful for specifications of
color or texture, or they may be used for special fonts that have fixed
numbers in some external coding scheme.
\yskip\hang|@!fnt3| 237 |k[3]|. Same as |fnt1|, except that |k|~is three
bytes long, so it can be as large as $2↑{24}-1$.
\yskip\hang|@!fnt4| 238 |k[4]|. Same as |fnt1|, except that |k|~is four
bytes long; this is for the really big font numbers (and for the negative ones).
\yskip\hang|xxx1| 239 |k[1]| |x[k]|. This command is undefined in
general; it functions as a $(k+2)$-byte |nop| unless special \.{DVI}-reading
programs are being used. \MF84 generates |xxx1| when a short enough λλλ
\.{\\special} appears, setting |k| to the number of bytes being sent. It
is recommended that |x| be a string having the form of a keyword followed
by possible parameters relevant to that keyword.
\yskip\hang|@!xxx2| 240 |k[2]| |x[k]|. Like |xxx1|, but |0≤k<65536|.
\yskip\hang|@!xxx3| 241 |k[3]| |x[k]|. Like |xxx1|, but |0≤k<@t$2↑{24}$@>|.
\yskip\hang|xxx4| 242 |k[4]| |x[k]|. Like |xxx1|, but |k| can be ridiculously
large. \MF84 uses |xxx4| when sending a string of length 256 or more.
\yskip\hang|fnt_def1| 243 |k[1]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
Define font |k|, where |0≤k<63|; font definitions will be explained shortly.
\yskip\hang|@!fnt_def2| 244 |k[2]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
Define font |k|, where |0≤k<65536|.
\yskip\hang|@!fnt_def3| 245 |k[3]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
Define font |k|, where |0≤k<@t$2↑{24}$@>|.
\yskip\hang|@!fnt_def4| 246 |k[4]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.
Define font |k|, where |@t$-2↑{31}$@>≤k<@t$2↑{30}$@>|.
\yskip\hang|pre| 247 |i[1]| |num[4]| |den[4]| |mag[4]| |k[1]| |x[k]|.
Beginning of the preamble; this must come at the very beginning of the
file. Parameters |i|, |num|, |den|, |mag|, |k|, and |x| are explained below.
\yskip\hang|post| 248. Beginning of the postamble, see below.
\yskip\hang|post_post| 249. Ending of the postamble, see below.
\yskip\noindent Commands 250--255 are undefined at the present time.
@ @d set_char_0=0 {typeset character 0 and move right}
@d set1=128 {typeset a character and move right}
@d set_rule=132 {typeset a rule and move right}
@d put1=133 {typeset a character}
@d put_rule=137 {typeset a rule}
@d nop=138 {no operation}
@d bop=139 {beginning of page}
@d eop=140 {ending of page}
@d push=141 {save the current positions}
@d pop=142 {restore previous positions}
@d right1=143 {move right}
@d w0=147 {move right by |w|}
@d w1=148 {move right and set |w|}
@d x0=152 {move right by |x|}
@d x1=153 {move right and set |x|}
@d down1=157 {move down}
@d y0=161 {move down by |y|}
@d y1=162 {move down and set |y|}
@d z0=166 {move down by |z|}
@d z1=167 {move down and set |z|}
@d fnt_num_0=171 {set current font to 0}
@d fnt1=235 {set current font}
@d xxx1=239 {extension to \.{DVI} primitives}
@d xxx4=242 {potentially long extension to \.{DVI} primitives}
@d fnt_def1=243 {define the meaning of a font number}
@d pre=247 {preamble}
@d post=248 {postamble beginning}
@d post_post=249 {postamble ending}
@ The preamble contains basic information about the file as a whole. As
stated above, there are six parameters:
$$\hbox{|@!i[1]| |@!num[4]| |@!den[4]| |@!mag[4]| |@!k[1]| |@!x[k]|.}$$
The |i| byte identifies \.{DVI} format; currently this byte is always set
to~2. (Some day we will set |i=3|, when \.{DVI} format makes another
incompatible change---perhaps in 1992.)
The next two parameters, |num| and |den|, are positive integers that define
the units of measurement; they are the numerator and denominator of a
fraction by which all dimensions in the \.{DVI} file could be multiplied
in order to get lengths in units of $10↑{-7}$ meters. Since $\rm 7227{pt} =
254{cm}$, and since \MF\ works with scaled points where there are $2↑{16}$
sp in a point, \MF\ sets
$|num|/|den|=(254\cdot10↑5)/(7227\cdot2↑{16})=25400000/473628672$.
@↑sp@>
The |mag| parameter is what \TeX\ calls \.{\\mag}, i.e., 1000 times the
desired magnification. The actual fraction by which dimensions are
multiplied is therefore $|mag|\cdot|num|/1000|den|$. Note that if a \MF\ λλλ
source document does not call for any `\.{true}' dimensions, and if you
change it only by specifying a different \.{\\mag} setting, the \.{DVI}
file that \MF\ creates will be completely unchanged except for the value
of |mag| in the preamble and postamble. (Fancy \.{DVI}-reading programs allow
users to override the |mag|~setting when a \.{DVI} file is being printed.)
Finally, |k| and |x| allow the \.{DVI} writer to include a comment, which is not
interpreted further. The length of comment |x| is |k|, where |0≤k<256|.
@d id_byte=2 {identifies the kind of \.{DVI} files described here}
@ Font definitions for a given font number |k| contain further parameters
$$\hbox{|c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.}$$
The four-byte value |c| is the check sum that \MF\ found in the \.{TFM}
file for this font; |c| should match the check sum of the font found by
programs that read this \.{DVI} file.
@↑check sum@>
Parameter |s| contains a fixed-point scale factor that is applied to the
character widths in font |k|; font dimensions in \.{TFM} files and other
font files are relative to this quantity, which is called the ``at size''
elsewhere in this documentation. The value of |s| is always positive and
less than $2↑{27}$. It is given in the same units as the other \.{DVI}
dimensions, i.e., in sp when \MF84 has made the file. Parameter |d| is
similar to |s|; it is the ``design size,'' and it is given in \.{DVI}
units that have not been corrected for the magnification~|mag| found in the
preamble. Thus, font |k| is to be used at $|mag|\cdot s/1000d$ times its
normal size.
The remaining part of a font definition gives the external name of the font,
which is an ASCII string of length |a+l|. The number |a| is the length
of the ``area'' or directory, and |l| is the length of the font name itself;
the standard local system font area is supposed to be used when |a=0|.
The |n| field contains the area in its first |a| bytes.
Font definitions must appear before the first use of a particular font number.
Once font |k| is defined, it must not be defined again; however, we
shall see below that font definitions appear in the postamble as well as
in the pages, so in this sense each font number is defined exactly twice,
if at all. Like |nop| commands, font definitions can appear before the first
|bop|, or between an |eop| and a |bop|.
@ Sometimes it is desirable to make horizontal or vertical rules line up
precisely with certain features in characters of a font. It is possible to
guarantee the correct matching between \.{DVI} output and the characters
generated by \MF\ by adhering to the following principles: (1)~The \MF\
characters should be positioned so that a bottom edge or left edge that is
supposed to line up with the bottom or left edge of a rule appears at the
reference point, i.e., in row~0 and column~0 of the \MF\ raster. This
ensures that the position of the rule will not be rounded differently when
the pixel size is not a perfect multiple of the units of measurement in
the \.{DVI} file. (2)~A typeset rule of height $a>0$ and width $b>0$
should be equivalent to a \MF-generated character having black pixels in
precisely those raster positions whose \MF\ coordinates satisfy
|0≤x<@t$\alpha$@>b| and |0≤y<@t$\alpha$@>a|, where $\alpha$ is the number
of pixels per \.{DVI} unit.
@:METAFONT}{\MF@>
@↑alignment of rules with characters@>
@↑rules aligning with characters@>
@ The last page in a \.{DVI} file is followed by `|post|'; this command
introduces the postamble, which summarizes important facts that \MF\ has
accumulated about the file, making it possible to print subsets of the data
with reasonable efficiency. The postamble has the form
$$\vbox{\halign{\hbox{#\hfil}\cr
|post| |p[4]| |num[4]| |den[4]| |mag[4]| |l[4]| |u[4]| |s[2]| |t[2]|\cr
$\langle\,$font definitions$\,\rangle$\cr
|post_post| |q[4]| |i[1]| 223's|[≥4]|\cr}}$$
Here |p| is a pointer to the final |bop| in the file. The next three
parameters, |num|, |den|, and |mag|, are duplicates of the quantities that
appeared in the preamble.
Parameters |l| and |u| give respectively the height-plus-depth of the tallest
page and the width of the widest page, in the same units as other dimensions
of the file. These numbers might be used by a \.{DVI}-reading program to
position individual ``pages'' on large sheets of film or paper.
Parameter |s| is the maximum stack depth (i.e., the largest excess of
|push| commands over |pop| commands) needed to process this file. Then
comes |t|, the total number of pages (|bop| commands) present.
The postamble continues with font definitions, which are any number of
\\{fnt\_def} commands as described above, possibly interspersed with |nop|
commands. Each font number that is used in the \.{DVI} file must be defined
exactly twice: Once before it is first selected by a \\{fnt} command, and once
in the postamble.
@ The last part of the postamble, following the |post_post| byte that
signifies the end of the font definitions, contains |q|, a pointer to the
|post| command that started the postamble. An identification byte, |i|,
comes next; this currently equals~2, as in the preamble.
The |i| byte is followed by four or more bytes that are all equal to
the decimal number 223 (i.e., @'337 in octal). \MF\ puts out four to seven of
these trailing bytes, until the total length of the file is a multiple of
four bytes, since this works out best on machines that pack four bytes per
word; but any number of 223's is allowed, as long as there are at least four
of them. In effect, 223 is a sort of signature that is added at the very end.
@↑Fuchs, David Raymond@>
This curious way to finish off a \.{DVI} file makes it feasible for
\.{DVI}-reading programs to find the postamble first, on most computers,
even though \MF\ wants to write the postamble last. Most operating
systems permit random access to individual words or bytes of a file, so
the \.{DVI} reader can start at the end and skip backwards over the 223's
until finding the identification byte. Then it can back up four bytes, read
|q|, and move to byte |q| of the file. This byte should, of course,
contain the value 248 (|post|); now the postamble can be read, so the
\.{DVI} reader discovers all the information needed for typesetting the
pages. Note that it is also possible to skip through the \.{DVI} file at
reasonably high speed to locate a particular page, if that proves
desirable. This saves a lot of time, since \.{DVI} files used in production
jobs tend to be large.
Unfortunately, however, standard \PASCAL\ does not include the ability to
@↑system dependencies@>
access a random position in a file, or even to determine the length of a file.
Almost all systems nowadays provide the necessary capabilities, so \.{DVI}
format has been designed to work most efficiently with modern operating systems.
But if \.{DVI} files have to be processed under the restrictions of standard
\PASCAL, one can simply read them from front to back, since the necessary
header information is present in the preamble and in the font definitions.
(The |l| and |u| and |s| and |t| parameters, which appear only in the
postamble, are ``frills'' that are handy but not absolutely necessary.)
@* \[52] Debugging.
Once \MF\ is working, it should be possible to diagnose most errors with
the \.{show} commands and other diagnostic features. But for the initial
stages of debugging, and for the revelation of really deep mysteries, you
can compile \MF\ with a few more aids including the \PASCAL\ runtime
checks and its debugger. An additional routine called |debug_help|
will also come into play when you type `\.D' after an error message;
|debug_help| also occurs just before a fatal error causes \MF\ to succumb.
@↑debugging@>
@↑system dependencies@>
The interface to |debug_help| is primitive, but it is good enough when used
with a \PASCAL\ debugger that allows you to set breakpoints and to read
variables and change their values. After getting the prompt `\.{debug \#}', you
type either a negative number (this exits |debug_help|), or zero (this
goes to a location where you can set a breakpoint, thereby entering into
dialog with the \PASCAL\ debugger), or a positive number |m| followed by
an argument |n|. The meaning of |m| and |n| will be clear from the
program below. (If |m=13|, there is an additional argument, |l|.)
@.\#@>
@d breakpoint=888 {place where a breakpoint is desirable}
@<Last-minute...@>=
@!debug procedure debug_help; {routine to display various things}
label breakpoint,exit;
var k,@!l,@!m,@!n:integer;
begin loop begin wake_up_terminal;
print_nl("debug # (-1 to exit):"); update_terminal;
@.debug \#@>
read(term_in,m);
if m<0 then return
else if m=0 then
begin goto breakpoint;@\ {go to every label at least once}
breakpoint: m←0; @{'BREAKPOINT'@}@\
end
else begin read(term_in,n);
case m of
@t\4@>@<Numbered cases for |debug_help|@>@;
othercases print("?")
endcases;
end;
end;
exit:end;
gubed
@ @<Numbered cases...@>=
1: print_word(mem[n]); {display |mem[n]| in all forms}
2: print_int(info(n));
3: print_int(link(n));
4: print_word(eqtb[n]);
5: print_word(font_info[n]);
6: print_word(save_stack[n]);
7: show_box(n);
{show a box, abbreviated by |show_box_depth| and |show_box_breadth|}
8: begin breadth_max←10000; depth_threshold←pool_size-pool_ptr-10;
show_node_list(n); {show a box in its entirety}
end;
9: show_token_list(n,null,1000);
10: print(n);
11: check_mem(n>0); {check wellformedness; print new busy locations if |n>0|}
12: search_mem(n); {look for pointers to |n|}
13: begin read(term_in,l); print_cmd_chr(n,l);
end;
14: for k←0 to n do print(buffer[k]);
15: begin font_in_short_display←null_font; short_display(n);
end;
16: panicking←not panicking;
@* \[54] System-dependent changes.
This section should be replaced, if necessary, by changes to the program
that are necessary to make \MF\ work at a particular installation.
It is usually best to design your change file so that all changes to
previous sections preserve the section numbering; then everybody's version
will be consistent with the published program. More extensive changes,
which introduce new sections, can be inserted here; then only the index
itself will get a new section number.
@↑system dependencies@>
@* \[55] Index.
Here is where you can find all uses of each identifier in the program,
with underlined entries pointing to where the identifier was defined.
If the identifier is only one letter long, however, you get to see only
the underlined entries. All references are to section numbers instead of
page numbers.
This index also lists error messages and other aspects of the program
that you might want to look up some day. For example, the entry
for ``system dependencies'' lists all sections that should receive
special attention from people who are installing \MF\ in a new
operating environment. A list of various things that can't happen appears
under ``this can't happen''. Approximately 40 sections are listed under
``inner loop''; these account for about 60\% of \MF's running time,
exclusive of input and output.